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Introduction to Fundamentals of Signal Processing 


What is Digital Signal Processing? 


To understand what is Digital Signal Processing (DSP) let’s examine what 
does each of its words mean. “Signal” is any physical quantity that carries 
information. “Processing” is a series of steps or operations to achieve a 
particular end. It is easy to see that Signal Processing is used everywhere to 
extract information from signals or to convert information-carrying signals 
from one form to another. For example, our brain and ears take input speech 
signals, and then process and convert them into meaningful words. Finally, 
the word “Digital” in Digital Signal Processing means that the process is 
done by computers, microprocessors, or logic circuits. 


The field DSP has expanded significantly over that last few decades as a 
result of rapid developments in computer technology and integrated-circuit 
fabrication. Consequently, DSP has played an increasingly important role in 
a wide range of disciplines in science and technology. Research and 
development in DSP are driving advancements in many high-tech areas 
including telecommunications, multimedia, medical and scientific imaging, 
and human-computer interaction. 


To illustrate the digital revolution and the impact of DSP, consider the 
development of digital cameras. Traditional film cameras mainly rely on 
physical properties of the optical lens, where higher quality requires bigger 
and larger system, to obtain good images. When digital cameras were first 
introduced, their quality were inferior compared to film cameras. But as 
microprocessors become more powerful, more sophisticated DSP 
algorithms have been developed for digital cameras to correct optical 
defects and improve the final image quality. Thanks to these developments, 
the quality of consumer-grade digital cameras has now surpassed the 
equivalence in film cameras. As further developments for digital cameras 
attached to cell phones (cameraphones), where due to small size 
requirements of the lenses, these cameras rely on DSP power to provide 
good images. Essentially, digital camera technology uses computational 
power to overcome physical limitations. We can find the similar trend 


happens in many other applications of DSP such as digital communications, 
digital imaging, digital television, and so on. 


In summary, DSP has foundations on Mathematics, Physics, and Computer 
Science, and can provide the key enabling technology in numerous 
applications. 


Overview of Key Concepts in Digital Signal Processing 


The two main characters in DSP are signals and systems. A signal is 
defined as any physical quantity that varies with one or more independent 
variables such as time (one-dimensional signal), or space (2-D or 3-D 
signal). Signals exist in several types. In the real-world, most of signals are 
continuous-time or analog signals that have values continuously at every 
value of time. To be processed by a computer, a continuous-time signal has 
to be first sampled in time into a discrete-time signal so that its values at a 
discrete set of time instants can be stored in computer memory locations. 
Furthermore, in order to be processed by logic circuits, these signal values 
have to be quantized in to a set of discrete values, and the final result is 
called a digital signal. When the quantization effect is ignored, the terms 
discrete-time signal and digital signal can be used interchangeability. 


In signal processing, a system is defined as a process whose input and 
output are signals. An important class of systems is the class of linear time- 
invariant (or shift-invariant) systems. These systems have a remarkable 
property is that each of them can be completely characterized by an 
impulse response function (sometimes is also called as point spread 
function), and the system is defined by a convolution (also referred to as a 
filtering) operation. Thus, a linear time-invariant system is equivalent to a 
(linear) filter. Linear time-invariant systems are classified into two types, 
those that have finite-duration impulse response (FIR) and those that 
have an infinite-duration impulse response (IIR). 


A signal can be viewed as a vector in a vector space. Thus, linear algebra 
provides a powerful framework to study signals and linear systems. In 
particular, given a vector space, each signal can be represented (or 
expanded) as a linear combination of elementary signals. The most 


important signal expansions are provided by the Fourier transforms. The 
Fourier transforms, as with general transforms, are often used effectively to 
transform a problem from one domain to another domain where it is much 
easier to solve or analyze. The two domains of a Fourier transform have 
physical meaning and are called the time domain and the frequency 
domain. 


Sampling, or the conversion of continuous-domain real-life signals to 
discrete numbers that can be processed by computers, is the essential 
bridge between the analog and the digital worlds. It is important to 
understand the connections between signals and systems in the real world 
and inside a computer. These connections are convenient to analyze in the 
frequency domain. Moreover, many signals and systems are specified by 
their frequency characteristics. 


Because any linear time-invariant system can be characterized as a filter, 
the design of such systems boils down to the design the associated filters. 
Typically, in the filter design process, we determine the coefficients of an 
FIR or UR filter that closely approximates the desired frequency response 
specifications. Together with Fourier transforms, the z-transform provides 
an effective tool to analyze and design digital filters. 


In many applications, signals are conveniently described via statistical 
models as random signals. It is remarkable that optimum linear filters (in 
the sense of minimum mean-square error), so called Wiener filters, can 
be determined using only second-order statistics (autocorrelation and 
crosscorrelation functions) of a stationary process. When these statistics 
cannot be specified beforehand or change over time, we can employ 
adaptive filters, where the filter coefficients are adapted to the signal 
statistics. The most popular algorithm to adaptively adjust the filter 
coefficients is the least-mean square (LMS) algorithm. 


Signals Represent Information 
A brief discussion of information and signals. This module includes an introduction to the notion of continuous 
and discrete-time signals. 


Whether analog or digital, information is represented by the fundamental quantity in electrical engineering: the 
signal. Stated in mathematical terms, a signal is merely a function. Analog signals are continuous-valued; digital 
signals are discrete-valued. The independent variable of the signal could be time (speech, for example), space 
(images), or the integers (denoting the sequencing of letters and numbers in the football score). 


Analog Signals 


Analog signals are usually signals defined over continuous independent variable(s). Speech is produced by 
your vocal cords exciting acoustic resonances in your vocal tract. The result is pressure waves propagating in the 
air, and the speech signal thus corresponds to a function having independent variables of space and time and a 
value corresponding to air pressure: s(a, t) (Here we use vector notation « to denote spatial coordinates). When 
you record someone talking, you are evaluating the speech signal at a particular spatial location, ag say. An 
example of the resulting waveform s(a, t) is shown in this figure. 
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A speech signal's amplitude relates to tiny air pressure 
variations. Shown is a recording of the vowel "e" (as in 
"speech"). 


Photographs are static, and are continuous-valued signals defined over space. Black-and-white images have only 
one value at each point in space, which amounts to its optical reflection properties. In [link], an image is shown, 
demonstrating that it (and all other images as well) are functions of two independent spatial variables. 

Lena 


On the left is the classic Lena image, which is used ubiquitously as a test image. It contains straight and 
curved lines, complicated texture, and a face. On the right is a perspective display of the Lena image as a 
signal: a function of two spatial variables. The colors merely help show what signal values are about the same 
size. In this image, signal values range between 0 and 255; why is that? 


Color images have values that express how reflectivity depends on the optical spectrum. Painters long ago found 
that mixing together combinations of the so-called primary colors--red, yellow and blue--can produce very realistic 
color images. Thus, images today are usually thought of as having three values at every point in space, but a 
different set of colors is used: How much of red, green and blue is present. Mathematically, color pictures are 


multivalued--vector-valued--signals: s(a#) = (r(a)g(a)b(a))”. 


Interesting cases abound where the analog signal depends not on a continuous variable, such as time, but on a 
discrete variable. For example, temperature readings taken every hour have continuous--analog--values, but the 
signal's independent variable is (essentially) the integers. 


Digital Signals 


The word "digital" means discrete-valued and implies the signal has an integer-valued independent variable. 
Digital information includes numbers and symbols (characters typed on the keyboard, for example). Computers 
rely on the digital representation of information to manipulate and transform information. Symbols do not have a 
numeric value, and each is represented by a unique number. The ASCII character code has the upper- and 
lowercase characters, the numbers, punctuation marks, and various other symbols represented by a seven-bit 
integer. For example, the ASCII code represents the letter a as the number 97 and the letter A as 65. [link] shows 
the international convention on associating characters with integers. 


00 nul 01 soh 02 stx 03 etx 04 eot 05 eng 06 ac 


10 dle 11 del 12 dc2 13 dc3 14 dc4 15 nak 16 sy 


18 car 19 em 1A sub 1B esc 1C fs 1D gs 1E rs 
20 sp 21 ! 22 i 23 # 24 $ 25 % 26 & 
28 ( 29 ) 2A * 2B + 2C ; 2D - 2E 

30 0 31 1 32 2 33 3 34 4 35 5 36 6 
38 8 39 9 3A : 3B : 3C < 3D = 3E > 
40 @ 41 A 42 B 43 C 44 D 45 E 46 F 
48 H 49 I 4A J 4B K 4C L 4D M 4E N 
50 P 51 Q 52 R 53 S 54 T 55 U 56 Vv 
58 Xx 59 Y 5A Z 5B [ 5C \ 5D ] 5E A 
60 ; 61 a 62 b 63 c 64 d 65 e 66 f 
68 h 69 i 6A j 6B k 6C 1 6D m 6E n 
70 p 71 q 72 r 73 S 74 t 75 u 76 Vv 
78 x 79 y 7A vA 7B { 7C | 7D } 7E e 


ASCII Table The ASCII translation table shows how standard keyboard characters are represented by integers. In pa 
table displays first the so-called 7-bit code (how many characters in a seven-bit code?), then the character the numbe 
numeric codes are represented in hexadecimal (base-16) notation. Mnemonic characters correspond to control chara: 
may be familiar (like cr for carriage return) and some not (bel means a "bell"). 


Introduction to Systems 

Introduction to the concept of a system, which is a mechanism for 
manipulating signals. Feedback concepts and superpositions are also briefly 
mentioned. 


Signals are manipulated by systems. Mathematically, we represent what a 
system does by the notation y(t) = S(a(t)), with x representing the input 
signal and y the output signal. 

Definition of a system 


The system depicted 
has input x(t) and 
output y(t). 
Mathematically, 
systems operate on 
function(s) to 
produce other 
function(s). In many 
ways, systems are 
like functions, rules 
that yield a value for 
the dependent 
variable (our output 
signal) for each value 
of its independent 
variable (its input 
signal). The notation 
y(t) = S(a(t)) 
corresponds to this 
block diagram. We 
term $(-) the input- 
output relation for 
the system. 


This notation mimics the mathematical symbology of a function: A system's 
input is analogous to an independent variable and its output the dependent 
variable. For the mathematically inclined, a system is a functional: a 
function of a function (signals are functions). 


Simple systems can be connected together--one system's output becomes 
another's input--to accomplish some overall design. Interconnection 
topologies can be quite complicated, but usually consist of weaves of three 
basic interconnection forms. 


Cascade Interconnection 


cascade 


The most rudimentary ways of 
interconnecting systems are shown in 
the figures in this section. This is the 

cascade configuration. 


The simplest form is when one system's output is connected only to 
another's input. Mathematically, w(t) = S;(a(t)), and y(t) = S5(w(t)), 
with the information contained in x(t) processed by the first, then the 
second system. In some cases, the ordering of the systems matter, in others 
it does not. For example, in the fundamental model of communication the 
ordering most certainly matters. 


Parallel Interconnection 


parallel 


The parallel configuration. 


A signal x(t) is routed to two (or more) systems, with this signal appearing 
as the input to all systems simultaneously and with equal strength. Block 
diagrams have the convention that signals going to more than one system 
are not split into pieces along the way. Two or more systems operate on 
z(t) and their outputs are added together to create the output y(t). Thus, 
y(t) = S;(ax(t)) + S2(ax(t)), and the information in x(t) is processed 
separately by both systems. 


Feedback Interconnection 


feedback 


The feedback 
configuration. 


The subtlest interconnection configuration has a system's output also 
contributing to its input. Engineers would say the output is "fed back" to the 


input through system 2, hence the terminology. The mathematical statement 
of the feedback interconnection is that the feed-forward system produces 
the output: y(t) = S1(e(t)). The input e(t) equals the input signal minus 
the output of some other system's output to y(t): e(t) = x(t) — So(y(t)). 
Feedback systems are omnipresent in control problems, with the error 
signal used to adjust the output to achieve some condition defined by the 
input (controlling) signal. For example, in a car's cruise control system, 
z(t) is a constant representing what speed you want, and y(t) is the car's 
speed as measured by a speedometer. In this application, system 2 is the 
identity system (output equals input). 


Systems in the Time-Domain 


A discrete-time signal s(n) is delayed by np samples when we write s(n — no), with 
no > 0. Choosing no to be negative advances the signal along the integers. As 
opposed to analog delays, discrete-time delays can only be integer valued. In the 
frequency domain, delaying a signal corresponds to a linear phase shift of the signal's 
discrete-time Fourier transform: s(n — ng) + e @7F)g (en), 


Linear discrete-time systems have the superposition property. 
Equation: 
Superposition 


S(ayx1(n) + agxo(n)) = a,S(x1(n)) + a2S(x2(n)) 
A discrete-time system is called shift-invariant (analogous to time-invariant analog 


systems) if delaying the input delays the corresponding output. 
Equation: 


Shift-Invariant 


If S(ax(n)) = y(n), Then S(ax(n — no)) = y(n — no) 


We use the term shift-invariant to emphasize that delays can only have integer values 
in discrete-time, while in analog signals, delays can be arbitrarily valued. 


We want to concentrate on systems that are both linear and shift-invariant. It will be 
these that allow us the full power of frequency-domain analysis and implementations. 
Because we have no physical constraints in "constructing" such systems, we need 
only a mathematical specification. In analog systems, the differential equation 
specifies the input-output relationship in the time-domain. The corresponding 
discrete-time specification is the difference equation. 
Equation: 

The Difference Equation 


y(n) = ayy(n — 1) +... + apy(n — p) + box(n) + bia(n — 1) +... + bgx(n — @) 


Here, the output signal y(7) is related to its past values y(n — 1), 1 = {1,...,p}, and 
to the current and past values of the input signal 2(n). The system's characteristics 
are determined by the choices for the number of coefficients p and q and the 
coefficients’ values {a ,...,a,)} and {bo, by, ..., by}. 


Note:There is an asymmetry in the coefficients: where is ao ? This coefficient would 
multiply the y(m) term in the difference equation. We have essentially divided the 
equation by it, which does not change the input-output relationship. We have thus 
created the convention that ag is always one. 


As opposed to differential equations, which only provide an implicit description of a 
system (we must somehow solve the differential equation), difference equations 
provide an explicit way of computing the output for any input. We simply express the 
difference equation by a program that calculates each output from the previous output 
values, and the current and previous inputs. 


Discrete Time Convolution 

Convolution is a concept that extends to all systems that are both linear and 
time-invariant (LTT). It will become apparent in this discussion that this 
condition is necessary by demonstrating how linearity and time-invariance 
give rise to convolution. 


Introduction 


Convolution, one of the most important concepts in electrical engineering, 
can be used to determine the output a system produces for a given input 
signal. It can be shown that a linear time invariant system is completely 
characterized by its impulse response. The sifting property of the discrete 
time impulse function tells us that the input signal to a system can be 
represented as a sum of scaled and shifted unit impulses. Thus, by linearity, 
it would seem reasonable to compute of the output signal as the sum of 
scaled and shifted unit impulse responses. That is exactly what the 
operation of convolution accomplishes. Hence, convolution can be used to 
determine a linear time invariant system's output from knowledge of the 
input and the impulse response. 


Convolution and Circular Convolution 


Convolution 


Operation Definition 


Discrete time convolution is an operation on two discrete time signals 
defined by the integral 
Equation: 


(F*9)n]= 3° flag in — A 


k=—0o 


for all signals f, g defined on Z. It is important to note that the operation of 
convolution is commutative, meaning that 
Equation: 


f*9 = 9" f 


for all signals f, g defined on Z. Thus, the convolution operation could 
have been just as easily stated using the equivalent definition 
Equation: 


(F*9) In] = 3° f[n— Ho [A 


k=—0o 


for all signals f, g defined on Z. Convolution has several other important 
properties not listed here but explained and derived in a later module. 


Definition Motivation 


The above operation definition has been chosen to be particularly useful in 
the study of linear time invariant systems. In order to see this, consider a 
linear time invariant system H with unit impulse response h. Given a 
system input signal z we would like to compute the system output signal 
H(z). First, we note that the input can be expressed as the convolution 
Equation: 


by the sifting property of the unit impulse function. By linearity 
Equation: 


Since H(d|n — k]) is the shifted unit impulse response h[n — k], this gives 
the result 
Equation: 


Hence, convolution has been defined such that the output of a linear time 
invariant system is given by the convolution of the system input with the 
system unit impulse response. 


Graphical Intuition 


It is often helpful to be able to visualize the computation of a convolution in 
terms of graphical processes. Consider the convolution of two functions 


f,g given by 
Equation: 


(F*9) [In] = 3° flAlgin—&] = > fm — Mol. 


k=—0o k=—0o 


The first step in graphically understanding the operation of convolution is to 
plot each of the functions. Next, one of the functions must be selected, and 
its plot reflected across the k = 0 axis. For each real n, that same function 
must be shifted left by n. The point-wise product of the two resulting plots 
is then computed, and then all of the values are summed. 


Example: 
Recall that the impulse response for a discrete time echoing feedback 
system with gain a is 


Equation: 
h|n] =a"u [nl], 


and consider the response to an input signal that is another exponential 
Equation: 


We know that the output for this input is given by the convolution of the 
impulse response with the input signal 
Equation: 


y|n] = x[n|*h[n). 


We would like to compute this operation by beginning in a way that 
minimizes the algebraic complexity of the expression. However, in this 
case, each possible choice is equally simple. Thus, we would like to 
compute 

Equation: 


y|n| = S a*u [k]b” *u [n — ky. 


k=—o0o 
The step functions can be used to further simplify this sum. Therefore, 
Equation: 
y|n] = 0 


forn < 0 and 
Equation: 


nfl => ak 
k=0 


for n > 0. Hence, provided ab ¥ 1, we have that 
Equation: 


Circular Convolution 


Discrete time circular convolution is an operation on two finite length or 
periodic discrete time signals defined by the sum 
Equation: 


(f ®g) [n] = S~ f [kg [n — 


for all signals f, g defined on Z[0, N — 1] where f, g are periodic 
extensions of f and g. It is important to note that the operation of circular 
convolution is commutative, meaning that 

Equation: 


f®eg=g9ef 


for all signals f, g defined on Z[0, N — 1]. Thus, the circular convolution 
operation could have been just as easily stated using the equivalent 
definition 

Equation: 


(f ®g)[n] = Sf [n — kg [ki] 


for all signals f, g defined on Z[0, N — 1] where f, g are periodic 
extensions of f and g. Circular convolution has several other important 
properties not listed here but explained and derived in a later module. 


Alternatively, discrete time circular convolution can be expressed as the 
sum of two summations given by 


Equation: 
n N-1 
(f@g)[n]=)_ flklgin—k]+ S° flkigin—-k+N] 
k=0 k=n+1 


for all signals f, g defined on Z[0, N — 1). 


Meaningful examples of computing discrete time circular convolutions in 
the time domain would involve complicated algebraic manipulations 
dealing with the wrap around behavior, which would ultimately be more 
confusing than helpful. Thus, none will be provided in this section. Of 
course, example computations in the time domain are easy to program and 
demonstrate. However, disrete time circular convolutions are more easily 
computed using frequency domain tools as will be shown in the discrete 
time Fourier series section. 


Definition Motivation 


The above operation definition has been chosen to be particularly useful in 
the study of linear time invariant systems. In order to see this, consider a 
linear time invariant system H with unit impulse response h. Given a 
periodic system input signal x we would like to compute the system output 
signal H(a). First, we note that the input can be expressed as the circular 
convolution 

Equation: 


by the sifting property of the unit impulse function. By linearity, 
Equation: 


Since H(6|n — k]) is the shifted unit impulse response h[n — k], this gives 
the result 
Equation: 


Hence, circular convolution has been defined such that the output of a linear 
time invariant system is given by the convolution of the system input with 
the system unit impulse response. 


Graphical Intuition 


It is often helpful to be able to visualize the computation of a circular 
convolution in terms of graphical processes. Consider the circular 
convolution of two finite length functions f, g given by 

Equation: 


(f ® 9) [n] = > FRG in —k] = SO F In — kG IK. 


The first step in graphically understanding the operation of convolution is to 
plot each of the periodic extensions of the functions. Next, one of the 
functions must be selected, and its plot reflected across the k = 0 axis. For 
each n € Z|0, N — 1], that same function must be shifted left by n. The 
point-wise product of the two resulting plots is then computed, and finally 
all of these values are summed. 


Interactive Element 


vtimeshiftDemo 


Interact (when online) with the Mathematica CDF 
demonstrating Discrete Linear Convolution. To 
download, right click and save file as .cdf 


Convolution Summary 


Convolution, one of the most important concepts in electrical engineering, 
can be used to determine the output signal of a linear time invariant system 
for a given input signal with knowledge of the system's unit impulse 
response. The operation of discrete time convolution is defined such that it 
performs this function for infinite length discrete time signals and systems. 
The operation of discrete time circular convolution is defined such that it 
performs this function for finite length and periodic discrete time signals. In 
each case, the output of the system is the convolution or circular 
convolution of the input signal with the unit impulse response. 


Review of Linear Algebra 


Vector spaces are the principal object of study in linear algebra. A vector 
space is always defined with respect to a field of scalars. 


Fields 


A field is a set F’ equipped with two operations, addition and mulitplication, 
and containing two special members 0 and 1 (0 ~ 1), such that for all 
{a,b,cleF 


1 l(ea+beF 
2.a+b=b+a 
3.(a+b)+c=a+(b+c) 
4.a+0=a 
5. there exists —a such that a ++ —a = 0 


labe F 

2.ab = ba 

3. (ab)c = a (bc) 

AG. 6 

5. there exists a! such that aa~! = 1 


3.a(b+c)=ab+ac 
More concisely 
1. F is an abelian group under addition 


2. Fis an abelian group under multiplication 
3. multiplication distributes over addition 


Examples 


Q,R,C 


Vector Spaces 


Let F' be a field, and V a set. We say V is a vector space over F if there 
exist two operations, defined for alla € F,uw € V andv € V: 


e vector addition: (u,v) - (w+v) EV 
e scalar multiplication: (a,v) > av € V 


and if there exists an element denoted 0 € V, such that the following hold 
foralac F,b€ Fyanduwe V,veEV,andweVv 


1 lut+(v+w)=(ut+v)+w 
2.uU+v=vU+U 


3.u+0=u 
4. there exists —w such that w+ —u — 0 


More concisely, 


1. V is an abelian group under plus 
2. Natural properties of scalar multiplication 


Examples 


¢ RY” isa vector space over R 

¢ C% is a vector space over C 
Cer R 

° is a vector space over 

¢ RY” is not a vector space over C 


The elements of V are called vectors. 


Euclidean Space 


Throughout this course we will think of a signal as a vector 


The samples {;} could be samples from a finite duration, continuous time 
signal, for example. 


A signal will belong to one of two vector spaces: 


Real Euclidean space 


az € RY (over R) 


Complex Euclidean space 


x € CNX (over C) 


Subspaces 
Let V be a vector space over F’. 


A subset S C V is called a subspace of V if S is a vector space over F' in 
its own right. 


Example: 
V = R?, F=R, S = any line though the origin. 


Ay 


—> X 


A 


S is any line through the origin. 


Are there other subspaces? 


S C V is a subspace if and only if for all a € F and b € F and for all 
s€Sandt€S,(as+bt)eS 


Linear Independence 
Let u1,...,u, € V. 


We say that these vectors are linearly dependent if there exist scalars 
a1,...,@a,% € F such that 


Equation: 
k 
> aQa{,Uy = 0 
i=l 


and at least one a; # 0. 


If [link] only holds for the case ay = ... = az = O,7 we say that the vectors 
are linearly independent. 


Example: 
1 —2 =o 
UP ee Se i) 
2 0 —2 


so these vectors are linearly dependent in R°. 


Spanning Sets 


Consider the subset S = {v1,v2,...,v%}. Define the span of S 


wer 


k 
< S$ >=span(S) = {Soa 
i=l 


Fact: < S > is a subspace of V. 


Example: 


V=R? F-R, S= 101,00},07 = 10 |,0 = |) 1)/> 


< S$ >= xy-plane. 


< S$ > is the xy-plane. 


Aside 


If S is infinite, the notions of linear independence and span are easily 
generalized: 


We say S is linearly independent if, for every finite collection 
U1,---,UK € S, (k arbitrary) we have 


k 
i=1 


The span of Sis 


k 
a 15 = {oa 


dl 


acFAwES A ie<c)| 


Note:In both definitions, we only consider finite sums. 


Bases 


A set B C V is called a basis for V over F if and only if 


1. B is linearly independent 
2-<—b = 


Bases are of fundamental importance in signal processing. They allow us to 


decompose a signal into building blocks (basis vectors) that are often more 
easily understood. 


Example: 
V = (real or complex) Euclidean space, R® or CX. 


B= {ej,...,en} = standard basis 


where the 1 is in the i*" position. 


Example: 
V =CN% over C. 


which is the DFT basis. 


Uk = 


where 2 = VY —l. 


Key Fact 


If Bis a basis for V, then every v € V can be written uniquely (up to order 
of terms) in the form 


where a; € F' and v; € B. 


Other Facts 


e If Sis a linearly independent set, then S' can be extended to a basis. 
e If < S >= V, then S contains a basis. 


Dimension 


Let V be a vector space with basis B. The dimension of V, denoted 
dim (V), is the cardinality of B. 


Every vector space has a basis. 
Every basis for a vector space has the same cardinality. 
= dim (V) is well-defined. 


If dim (V) < oo, we say V is finite dimensional. 


Examples 
vector space field of scalars dimension 
RY R 
Cy C 
ce R 


Every subspace is a vector space, and therefore has its own dimension. 


Example: 
Suppose (S = {u1,...,ux}) C V isa linearly independent set. Then 


came S25) 


Facts 


e If S is a subspace of V, then dim (S) < dim(V). 
e Ifdim(S) = dim(V) < o, thn S=V. 


Direct Sums 
Let V be a vector space, and let S C V and T C V be subspaces. 


We say V is the direct sum of S and 7’, written V = S @ T, if and only if 
for every v © V, there exist unique s € S andt € TJ’ such that v = s+ f. 


If V = S @T, then T is called a complement of S. 


Example: 
V=C' ={f:R—RJfis continuous} 
S = even funcitons inC’ 


T = odd funcitons inC’ 
il i 


BA ee) na ae) Glaciog IN) lilt) 


heap Sevohes eso) Aa oe Seley Ge ee We sie ee ion 
g—g' =h' — his odd and even, which implies g = g’ andh = h’. 


Facts 


1. Every subspace has a complement 
2.V = S@T if and only if 


LSor= {ot 
2.4 5.1 S=V 


3.1fV = S@T, and dim (V) < ov, then 
dim (V) = dim (S$) + dim (T) 


Proofs 


Invoke a basis. 


Norms 


Let V be a vector space over F’. A norm is amapping V — F’, denoted by 
|| + ||, such that forallu €V,v Ee V,andAcCF 


1. || uw ||> Oifu ~0 
2. || Aw ||= |A| || e || 
3. [utes ull+| oll 


Examples 


Euclidean norms: 


2 eR: 


xecn: 


Induced Metric 
Every norm induces a metric on V 
d(u,v) =||u—v | 


which leads to a notion of "distance" between vectors. 


Inner products 


Let V be a vector space over F’, F = Ror C. An inner product is a 
mapping V x V -—> F, denoted (-, -), such that 


1. (v,v) > 0, and (v,v) =O Sv=0 
2. (U,V) = (v,u) 
3. (au + bv, w) = a ((u,w)) + b((v, w)) 


Examples 
RY” over R: 
N 
(x,y) c= (a7 y) = So iy: 
i=l 
CX over C: 


8 
I 


is called the "Hermitian," or "conjugate transpose" of a. 


Triangle Inequality 
If we define || w ||= (w, w), then 
wt v |[<l] ul] + |e | 


Hence, every inner product induces a norm. 


Cauchy-Schwarz Inequality 
ForallweV,ve V, 
(u,v)| <|[ w |] || v || 


In inner product spaces, we have a notion of the angle between two vectors: 


(tu v) = arceos( 7) |) € [0, 2m) 


| w | |e || 


Orthogonality 
uw and v are orthogonal if 

(u,v) =0 
Notation: wu L v. 


If in addition || w |/=|| v ||= 1, we say wu and v are orthonormal. 


In an orthogonal (orthonormal) set, each pair of vectors is orthogonal 
(orthonormal). 


Ay 


= X 


Orthogonal vectors in R?. 


Orthonormal Bases 
An Orthonormal basis is a basis {v; } such that 


lifi=j 


a 
Min Vi) = 949 ree 


Example: 
The standard basis for R% or C” 


Example: 
The normalized DFT basis 


Uk = 


ue 
VN 


Expansion Coefficients 


If the representation of v with respect to {v;} is 
v= » AYU; 
i 
then 
GH Uw) 


Gram-Schmidt 


Every inner product space has an orthonormal basis. Any (countable) basis 
can be made orthogonal by the Gram-Schmidt orthogonalization process. 


Orthogonal Compliments 


Let S C V bea subspace. The orthogonal compliment S is 


St={ulueV A ((u,v) =0) A Vu: (ve S)} 
S'+ is easily seen to be a subspace. 


If dim (v) < 00, then V= SQ S". 


Note:If dim (v) = 00, then in order to have V = S$ @ S+ we require V to 
be a Hilbert Space. 


Linear Transformations 


Loosely speaking, a linear transformation is a mapping from one vector 
space to another that preserves vector space operations. 


More precisely, let V, W be vector spaces over the same field F’. A linear 
transformation is a mapping 7’: V — W such that 


T(au + bv) = aT(u) + bT(v) 
foralac F,b€ Fanduc V,veEV. 


In this class we will be concerned with linear transformations between (real 
or complex) Euclidean spaces, or subspaces thereof. 


Image 


image (T) = {w| we W A T(v) = wfor somev} 


Nullspace 


Also known as the kernel: 


ker (T) = {vj v EV A (T(v) = 90)} 


Both the image and the nullspace are easily seen to be subspaces. 


Rank 


rank (T) = dim (image (T)) 


Nullity 


null (7) = dim (ker (T')) 


Rank plus nullity theorem 


rank (TJ) + null (7) = dim (V) 


Matrices 


Every linear transformation T' has a matrix representation. If 
T:EX ~ E”“,E=RorC, then T is represented by an M x N matrix 


Q@M1 --- QA@MN 


where (aj;...ayi)’ = T(e;) and e; = (0...1...0)" is the i** standard 
basis vector. 


Note:A linear transformation can be represented with respect to any bases 
of EN and E™, leading to a different A. We will always represent a linear 
transformation using the standard bases. 


Column span 


colspan (A) =< A >=image (A) 


Duality 


If A: RY > R™, then 


ker+(A) =image (A*) 


HA<:<C* —-C™ , then 


ker~ (A) =image (A®) 


Inverses 


The linear transformation/matrix A is invertible if and only if there exists a 
matrix B such that AB = BA = I (identity). 


Only square matrices can be invertible. 
Let A: FX — F% be linear, F = R or C. The following are equivalent: 
1. A is invertible (nonsingular) 


2.rank(A) = N 
3. null(A) = 0 


4.det A #0 
5. The columns of A form a basis. 


If A~' = A? (or A# in the complex case), we say A is orthogonal (or 
unitary). 


Hilbert Spaces 
This module will provide an introduction to the concepts of Hilbert spaces. 


Hilbert Spaces 


A vector space § with a valid inner product defined on it is called an inner 
product space, which is also a normed linear space. A Hilbert space is 
an inner product space that is complete with respect to the norm defined 
using the inner product. Hilbert spaces are named after David Hilbert, who 
developed this idea through his studies of integral equations. We define our 
valid norm using the inner product as: 


Equation: 
| @ I= y/(@, 2) 


Hilbert spaces are useful in studying and generalizing the concepts of 
Fourier expansion, Fourier transforms, and are very important to the study 
of quantum mechanics. Hilbert spaces are studied under the functional 
analysis branch of mathematics. 


Examples of Hilbert Spaces 


Below we will list a few examples of Hilbert spaces. You can verify that 
these are valid inner products at home. 


e For C”, 
LO 
- A al n—l1 
Sy =—y 2= We or <« Baa) =) 2; 
1-0 
Ln-1 


e Space of finite energy complex functions: L?(R) 


if.9)= | © f(t)g(t) dt 


¢ Space of square-summable sequences: £7(Z) 


CO 


(z,y)= >> afilyfé] 


1=—00O 


Signal Expansions 

The module looks at decomposing signals through orthonormal basis 
expansion to provide an alternative representation. The module presents 
many examples of solving these problems and looks at them in several 
spaces and dimensions. 


Main Idea 


When working with signals many times it is helpful to break up a signal 
into smaller, more manageable parts. Hopefully by now you have been 
exposed to the concept of eigenvectors and there use in decomposing a 
signal into one of its possible basis. By doing this we are able to simplify 
our calculations of signals and systems through eigenfunctions of LTT 
systems. 


Now we would like to look at an alternative way to represent signals, 
through the use of orthonormal basis. We can think of orthonormal basis 
as a set of building blocks we use to construct functions. We will build up 
the signal/vector as a weighted sum of basis elements. 


Example: 

The complex sinusoids em for all —coo < n < oo form an 
orthonormal basis for L?((0, 71). 

In our Fourier series equation, f(t) = >>, 
another representation of f(t). 


wont 


cre, the {c,,} are just 


Note: For signals/vectors in a Hilbert Space, the expansion coefficients are 
easy to find. 


Alternate Representation 


Recall our definition of a basis: A set of vectors {b; } in a vector space S is 
a basis if 


1. The 6; are linearly independent. 
2. The b; span S. That is, we can find {a;}, where a; € C (scalars) such 


that 
Equation: 


Ve,xr ES: (« a Yo] 


where a is a vector in S, a is a scalar in C, and b is a vector in S. 


Condition 2 in the above definition says we can decompose any vector in 
terms of the {b;}. Condition 1 ensures that the decomposition is unique 
(think about this at home). 


Note: The {a} provide an alternate representation of x. 


Example: 
Let us look at simple example in R?, where we have the following vector: 


il 
-( 
Standard Basis: {e9,e;} = { (10)", (01) 


x= eo + 2e) 


Alternate Basis: {ho, hy} = cane (1-1)"} 


In general, given a basis {bg, 6; } and a vector x € R?, how do we find the 
Qo and a such that 
Equation: 


4 bie aobo == ab, 


Finding the Coefficients 


Now let us address the question posed above about finding a's in general 
for R?. We start by rewriting [link] so that we can stack our 6,;'s as columns 
in a 2x2 matrix. 


Equation: 
(a) = a (bo) + a1 (61) 
Equation: 
(2) =| b0 | (2°) 
a1 
Example: 


Here is a simple example, which shows a little more detail about the above 
equations. 
Equation: 


(Sa) = oC) + (ora) 
agbo|0] SP ab [0] 
Ga qo ay 


Equation: 


Simplifying our Equation 


To make notation simpler, we define the following two items from the 
above equations: 


e Basis Matrix: 


e Coefficient Vector: 


a0 
a = 
1 
This gives us the following, concise equation: 
Equation: 


z— Ba 


which is equivalent to # = SS a;b;. 


Example: 


1 0 
Given a standard basis, { (5) ; ()) \ then we have the following basis 


e=(1 9) 


matrix: 


To get the a's, we solve for the coefficient vector in [link] 
Equation: 


a—B'z 


Where B~! is the inverse matrix of B. 
Examples 


Example: 
Let us look at the standard basis first and try to calculate a from it. 


Where I is the identity matrix. In order to solve for & let us find the 
inverse of B first (which is obviously very trivial in this case): 


pi 9 
Hoo 


Therefore we get, 


Example: 
Let us look at a ever-so-slightly more complicated basis of 


ik ik 
‘ ()) ; ( :) } = {ho, hi} Then our basis matrix and inverse basis 


matrix becomes: 


Now we solve for @ 


an ten | 


dR ple 


and we get 


Exercise: 


Problem: 


Now we are given the following basis matrix and x: 


ewar-{()-0)} 
-() 


For this problem, make a sketch of the bases and then represent 2 in 
terms of bo and bj. 


Solution: 


In order to represent x in terms of bo and 6b; we will follow the same 
steps we used in the above example. 


And now we can write 2 in terms of bo and 6j. 
2 
4 bs bo + ae 


And we can easily substitute in our known values of bp and 6; to 
verify our results. 


Note: A change of basis simply looks at x from a "different perspective." 
B~ transforms z from the standard basis to our new basis, {bo, 61}. 
Notice that this is a totally mechanical procedure. 


Extending the Dimension and Space 


We can also extend all these ideas past just IR? and look at them in R” and 
C”. This procedure extends naturally to higher (> 2) dimensions. Given a 

basis {bo, b1,...,bn_1} for R”, we want to find {a9, a1, ..., @n—1} such 
that 

Equation: 


z= agbop + a 1b, +... + An_1bn_-1 


Again, we will set up a basis matrix 
B=(bo bi be ... bn-1) 


where the columns equal the basis vectors and it will always be an nxn 
matrix (although the above matrix does not appear to be square since we 
left terms in vector notation). We can then proceed to rewrite [link] 


Qo 
x = (bo by ee bn—1) = Ba 


An-1 


and 


Introduction to Fourier Analysis 
Lists the four Fourier transforms and when to use them. 


Fourier's Daring Leap 


Fourier postulated around 1807 that any periodic signal (equivalently finite 
length signal) can be built up as an infinite linear combination of harmonic 
sinusoidal waves. 


i.e. Given the collection 
Equation: 


on ,) © 
B= {etl 
n=—0o 


any 
Equation: 


f(t) € L* [0,T) 


can be approximated arbitrarily closely by 
Equation: 


f()= 3 C, eft, 


n=— Oo 


Now, The issue of exact convergence did bring Fourier much criticism from 
the French Academy of Science (Laplace, Lagrange, Monge and LaCroix 
comprised the review committee) for several years after its presentation on 
1807. It was not resolved for also a century, and its resolution is interesting 
and important to understand from a practical viewpoint. See more in the 
section on Gibbs Phenomena. 


Fourier analysis is fundamental to understanding the behavior of signals 
and systems. This is a result of the fact that sinusoids are Eigenfunctions of 
linear, time-invariant (LT1) systems. This is to say that if we pass any 
particular sinusoid through a LTI system, we get a scaled version of that 
same sinusoid on the output. Then, since Fourier analysis allows us to 
redefine the signals in terms of sinusoids, all we need to do is determine 
how any given system effects all possible sinusoids (its transfer function) 
and we have a complete understanding of the system. Furthermore, since 
we are able to define the passage of sinusoids through a system as 
multiplication of that sinusoid by the transfer function at the same 
frequency, we can convert the passage of any signal through a system from 
convolution (in time) to multiplication (in frequency). These ideas are what 
give Fourier analysis its power. 


Now, after hopefully having sold you on the value of this method of 
analysis, we must examine exactly what we mean by Fourier analysis. The 
four Fourier transforms that comprise this analysis are the Fourier Series, 
Continuous-Time Fourier Transform, Discrete-Time Fourier Transform and 
Discrete Fourier Transform. For this document, we will view the Laplace 
Transform and Z-Transform as simply extensions of the CTFT and DTFT 
respectively. All of these transforms act essentially the same way, by 
converting a signal in time to an equivalent signal in frequency (sinusoids). 
However, depending on the nature of a specific signal i.e. whether it is 
finite- or infinite-length and whether it is discrete- or continuous-time) there 
is an appropriate transform to convert the signal into the frequency domain. 
Below is a table of the four Fourier transforms and when each is 
appropriate. It also includes the relevant convolution for the specified 
space. 


Frequency 
Transform Time Domain Domain Convolution 


Transform 


Continuous- 
Time 
Fourier 
Series 


Continuous- 
Time 
Fourier 
Transform 


Discrete- 
Time 
Fourier 
Transform 


Discrete 
Fourier 
Transform 


Time Domain 


L*((0,T)) 


L*( ) 


PC ) 


?([0, N — 1) 


Table of Fourier Representations 


Frequency 
Domain 


PC ) 


L*( ) 


L?({0, 277)) 


?([0, N — 1) 


Convolution 


Continuous- 
Time 
Circular 


Continuous- 
Time Linear 


Discrete- 
Time Linear 


Discrete- 
Time 
Circular 


Continuous Time Fourier Transform (CTFT) 
Details the Continuous-Time Fourier Transform. 


Introduction 


In this module, we will derive an expansion for any arbitrary continuous- 
time function, and in doing so, derive the Continuous Time Fourier 
Transform (CTFT). 


Since complex exponentials are eigenfunctions of linear time-invariant 
(LTI) systems, calculating the output of an LTI system # given e™ as an 
input amounts to simple multiplication, where H(s) € C is the eigenvalue 
corresponding to s. As shown in the figure, a simple exponential input 
would yield the output 

Equation: 


[missing_resource: simpleLTIsys.png] 


Using this and the fact that # is linear, calculating y(t) for combinations 
of complex exponentials is also straightforward. 


cje*"’ + cpe®* —> c1H(s;)e*" + coH(s2)e*™ 


S> Zer = > CnH(sn)e*"" 
n n 


The action of H on an input such as those in the two equations above is 
easy to explain. # independently scales each exponential component e*"’ 
by a different complex number H(s,,) € C. As such, if we can write a 
function f(t) as a combination of complex exponentials it allows us to 
easily calculate the output of a system. 


Now, we will look to use the power of complex exponentials to see how we 
may represent arbitrary signals in terms of a set of simpler functions by 
superposition of a number of complex exponentials. Below we will present 


the Continuous-Time Fourier Transform (CTFT), commonly referred to 
as just the Fourier Transform (FT). Because the CTFT deals with 
nonperiodic signals, we must find a way to include all real frequencies in 
the general equations. For the CTFT we simply utilize integration over real 
numbers rather than summation over integers in order to express the 
aperiodic signals. 


Fourier Transform Synthesis 


Joseph Fourier demonstrated that an arbitrary s(t) can be written as a linear 
combination of harmonic complex sinusoids 


Equation: 
= . 
s(t) = S- eer 
n=—0o 
where wo = 22 is the fundamental frequency. For almost all s(t) of 


practical interest, there exists c,, to make [link] true. If s(t) is finite energy ( 
s(t) € L?(0,7)), then the equality in [link] holds in the sense of energy 
convergence; if s(t) is continuous, then [link] holds pointwise. Also, if s(t) 
meets some mild conditions (the Dirichlet conditions), then [link] holds 
pointwise everywhere except at points of discontinuity. 


The c,, - called the Fourier coefficients - tell us "how much" of the sinusoid 
eJont is in s(t). The formula shows s(t) as a sum of complex exponentials, 
each of which is easily processed by an LTT system (since it is an 
eigenfunction of every LTI system). Mathematically, it tells us that the set 
of complex exponentials {Vn, neZ: (esvont) } form a basis for the space 
of T-periodic continuous time functions. 


Equations 


Now, in order to take this useful tool and apply it to arbitrary non-periodic 
signals, we will have to delve deeper into the use of the superposition 


principle. Let s7 (t) be a periodic signal having period T’. We want to 
consider what happens to this signal's spectrum as the period goes to 
infinity. We denote the spectrum for any assumed value of the period by 
Cn (1). We calculate the spectrum according to the Fourier formula for a 
periodic signal, known as the Fourier Series (for more on this derivation, 
see the section on Fourier Series.) 

Equation: 


1 rr 
C= = / s(t) exp (—1wot) dt 
L’ Jo 


where wo = x and where we have used a symmetric placement of the 
integration interval about the origin for subsequent derivational 
convenience. We vary the frequency index n proportionally as we increase 
the period. Define 

Equation: 


St (f) =Tcen = zi (sr (f) exp (1wot) dt 


making the corresponding Fourier Series 
Equation: 


sn (t) = SF (t) exp (wot) x) 


As the period increases, the spectral lines become closer together, becoming 
a continuum. Therefore, 
Equation: 


lim sr (t) = s(t) = [sp exp (1wot) df 


with 


Equation: 
S(f) = / Olena a 
Equation: 
Continuous-Time Fourier Transform 
F(2) = / f(t)e-@) at 
Equation: 


Inverse CTFT 


j= a / ” F(Q)ei™ a2 


Note:It is not uncommon to see the above formula written slightly 
different. One of the most common differences is the way that the 
exponential is written. The above equations use the radial frequency 
variable {2 in the exponential, where {2 = 27f, but it is also common to 
include the more explicit expression, 727 ft, in the exponential. Click here 
for an overview of the notation used in Connexion's DSP modules. 


Example: 
We know from Euler's formula that . 
cos (wt)+ sin (wt) = 4 ejut oa Fd Jet 


CTFT Definition Demonstration 


#:CTFTDemo 
Interact (when online) with a Mathematica 
CDF demonstrating Continuous Time Fourier 


Transform. To Download, right-click and save 
as .cdf. 


Example Problems 


Exercise: 


Problem: Find the Fourier Transform (CTFT) of the function 


Equation: 
—(ot) if t>0 
e i 
f(t) = { —. 
0 otherwise 


Solution: 


In order to calculate the Fourier transform, all we need to use is [link], 
complex exponentials, and basic calculus. 
Equation: 


BD) = SP F(tle™ at 
= { B= (at) 6 — (it) dt 
= fo e\- )(a+iQ) dt 


=I 
= 0- ati 
Equation: 
1 
Q ——_ 
(2) a+ifd 
Exercise: 
Problem: 


Find the inverse Fourier transform of the ideal lowpass filter defined 


by 
Equation: 
1if |Q|}<M 
X(2) = : 
0 otherwise 
Solution: 


Here we will use [link] to find the inverse FT given that t ~ 0. 
Equation: 


a(t) = + fy, eda 


1 i(O,t . 
ae" | 2,0=e 


= + sin(Mt) 


Equation: 


Fourier Transform Summary 


Because complex exponentials are eigenfunctions of LTI systems, it is often 
useful to represent signals using a set of complex exponentials as a basis. 
The continuous time Fourier series synthesis formula expresses a 
continuous time, periodic function as the sum of continuous time, discrete 
frequency complex exponentials. 

Equation: 


j= Se 


n=—CO 


The continuous time Fourier series analysis formula gives the coefficients 
of the Fourier series expansion. 
Equation: 


1 


T 
eco —(jwont) 
cn = Tp ; f(tye dt 


In both of these equations wp = oh is the fundamental frequency. 


Discrete Time Fourier Transform (DTFT) 
Details the discrete-time fourier transform. 


Introduction 


In this module, we will derive an expansion for arbitrary discrete-time 
functions, and in doing so, derive the Discrete Time Fourier Transform 
(DTFT). 


Since complex exponentials are eigenfunctions of linear time-invariant 
(LTI) systems, calculating the output of an LTI system # given e“” as an 
input amounts to simple multiplication, where wy = ae , and where 

Hk] € C is the eigenvalue corresponding to k. As shown in the figure, a 
simple exponential input would yield the output 


Equation: 


Simple LTI system. 


Using this and the fact that # is linear, calculating y[n] for combinations 
of complex exponentials is also straightforward. 


ger" +o6°"S c, H{kyJe"” + co H{ky]e” 


S° eer = S° cH {kije™” 
I I 


The action of H on an input such as those in the two equations above is 
easy to explain. # independently scales each exponential component 
eI" by a different complex number H|k;| € C. As such, if we can write a 
function y|n] as a combination of complex exponentials it allows us to 
easily calculate the output of a system. 


Now, we will look to use the power of complex exponentials to see how we 
may represent arbitrary signals in terms of a set of simpler functions by 
superposition of a number of complex exponentials. Below we will present 
the Discrete-Time Fourier Transform (DTFT). Because the DTFT deals 
with nonperiodic signals, we must find a way to include all real frequencies 
in the general equations. For the DTFT we simply utilize summation over 
all real numbers rather than summation over integers in order to express the 
aperiodic signals. 


DTFT synthesis 


It can be demonstrated that an arbitrary Discrete Time-periodic function 
f|n] can be written as a linear combination of harmonic complex sinusoids 
Equation: 


N-1 
fin] = Senet 
k=0 


where wo = 32 is the fundamental frequency. For almost all f[n] of 
practical interest, there exists c,, to make [link] true. If f[n] is finite energy 
( f[n] € L?[0, N}), then the equality in [link] holds in the sense of energy 
convergence; with discrete-time signals, there are no concerns for 
divergence as there are with continuous-time signals. 


The c,, - called the Fourier coefficients - tell us "how much" of the sinusoid 
eJokn is in f[n]. The formula shows f[n] as a sum of complex 
exponentials, each of which is easily processed by an LTI system (since it is 
an eigenfunction of every LTI system). Mathematically, it tells us that the 


set of complex exponentials {Vk, keZ: (eter) } form a basis for the 
space of N-periodic discrete time functions. 


Equations 


Now, in order to take this useful tool and apply it to arbitrary non-periodic 
signals, we will have to delve deeper into the use of the superposition 
principle. Let s7 (t) be a periodic signal having period T’. We want to 
consider what happens to this signal's spectrum as the period goes to 
infinity. We denote the spectrum for any assumed value of the period by 
Cn (T’). We calculate the spectrum according to the Fourier formula for a 
periodic signal, known as the Fourier Series (for more on this derivation, 
see the section on Fourier Series.) 

Equation: 


= ang f et ) exp (—1w ot) dt 


where Wo = - and where we have used a symmetric placement of the 
integration interval about the origin for subsequent derivational 
convenience. We vary the frequency index n proportionally as we increase 
the period. Define 


Equation: 


Sr(f) =Tc, = =| (sr (f) exp (1wot) dt 


making the corresponding Fourier Series 
Equation: 


=o exp (1wot) *) 


As the period increases, the spectral lines become closer together, becoming 
a continuum. Therefore, 


Equation: 
jim sr (t)=s(t)= [so exp (1wot) df 

with 
Equation: 

S(f) = / Oana 
Equation: 

Discrete-Time Fourier Transform 

Fa= > fine 
Equation: 
Inverse DTFT 


fin| = af F(wye" dw 


Note:It is not uncommon to see the above formula written slightly 
different. One of the most common differences is the way that the 
exponential is written. The above equations use the radial frequency 
variable w in the exponential, where w = 27f, but it is also common to 
include the more explicit expression, 727 ft, in the exponential. Sometimes 
DTFT notation is expressed as F’ (e™) , to make it clear that it is not a 


CTFT (which is denoted as F(2)). Click here for an overview of the 
notation used in Connexion's DSP modules. 


DTFT Definition demonstration 


lDTFET Demo 


Click on the above thumbnail image (when 
online) to download an interactive Mathematica 
Player demonstrating Discrete Time Fourier 
Transform. To Download, right-click and save 
target as .cdf. 


DTFT Summary 


Because complex exponentials are eigenfunctions of LTI systems, it is often 
useful to represent signals using a set of complex exponentials as a basis. 
The discrete time Fourier transform synthesis formula expresses a discrete 
time, aperiodic function as the infinite sum of continuous frequency 
complex exponentials. 

Equation: 


The discrete time Fourier transform analysis formula takes the same 
discrete time domain signal and represents the signal in the continuous 
frequency domain. 

Equation: 


fin) = = a F(wye" dw 


DFT as a Matrix Operation 


Matrix Review 
Recall: 


e Vectors in RY”: 


x0 
L1 
Vz;,,2; ER: | u= 
tL N-1 
° Vectors in C%: 
XL 
Ly 
Vo,0 2 Ci lee= 
tL N-1 
e ‘Transposition: 
1. transpose: 
‘4 
oS (oy a cue ee) 
2. conjugate: 
H _—— —— 
er =(29 Dy «an PN) 


e Inner product: 


1. real: 


2. complex: 


N-1 
H — 
2 y= a ee 
i=0 
e Matrix Multiplication: 
00 a1 s+ G0,N-1 L0 Yo 
10 Q11 tee Q1,N-1 L1 Y1 
Az = = 
QN-1,0 @N-1,1 ++. @N-1,N-1 N= YN-1 
N-1 
Yk = S Akntn 
n=0 
e Matrix Transposition: 
200 10 sae aN-1,0 
f aol Q11 ses QN-1,1 
Ae = 
Q0,N-1 @1,N-1 --- Q@N-1,N-1 


Matrix transposition involved simply swapping the rows with columns. 
AH — AT 
The above equation is Hermitian transpose. 


r 
A kn — Ank 


Representing DFT as Matrix Operation 


Now let's represent the DFT in vector-matrix notation. 


X(N -]] 


Here z is the vector of time samples and X is the vector of DFT 
coefficients. How are # and X related: 


X|k] = Dy z[nje~ rk") 
n=0 
where 
_ kn 
Arn = (c (%) ) = Wn 
SO 


X—Wz2 


where X is the DFT vector, W is the matrix and 2 the time domain vector. 


Wie (e-#))™ 


IDFT: 


where 


Wn™ is the matrix Hermitian transpose. So, 
ee 
e=—W XxX 
N 


where 2 is the time vector, 5 Wt is the inverse DFT matrix, and X is the 
DFT vector. 


The FFT Algorithm 
FFT 


(Fast Fourier Transform) An efficient computational algorithm for 
computing the DFT. 


The Fast Fourier Transform FFT 


DFT can be expensive to compute directly 
Vk,O<k<N-1: (x a 5 ane tt 


For each k, we must execute: 


e N complex multiplies 
e N — 1 complex adds 


The total cost of direct computation of an N-point DFT is 


e N? complex multiplies 
¢ N(N — 1) complex adds 


How many adds and mults of real numbers are required? 


This " O (N “ " computation rapidly gets out of hand, as NV gets large: 


N 1 10 100 1000 10° 


N2 1 100 10,000 10° 10” 


Image not finished 


The FFT provides us with a much more efficient way of computing the 
DFT. The FFT requires only " O(N log NV)" computations to compute the 
N-point DFT. 


N 10 100 1000 10° 
N2 100 10,000 10° 10” 
Nlog N 10 200 3000 6 x 10° 


How long is 101sec? More than 10 days! How long is 6 x 10° sec? 


Image not finished 


The FFT and digital computers revolutionized DSP (1960 - 1980). 


How does the FFT work? 
e The FFT exploits the symmetries of the complex exponentials 


© Wy” are called "twiddle factors" 


Symmetry 
Complex Conjugate Symmetry 


Wyk) = Wy) =: Wy 


Symmetry 
Periodicity in n and k 


Decimation in Time FFT 


e Just one of many different FFT algorithms 

e The idea is to build a DFT out of smaller and smaller DFTs by 
decomposing z|n]| into smaller and smaller subsequences. 

e Assume N = 2” (a power of 2) 


Derivation 


N is even, so we can complete X[k] by separating x|n] into two 
subsequences each of length 2. 


5 if n = even 
a XN if n = odd 
: = 


Vk,O<k<N-1: (x -\ sn") 


X[k] = So a[njWy' + So a[n|Wy 


n=2r n=2r+1 


where 0 <r < 7 —1.So0 
Equation: 


N 
a 


N_ 
X|k] = 4 a[2r]W yk" + a x[2r + 1]Wy 2rtDk 


N 
Beg 


D) kr k pues 9 kr 
gag elon (Wn ) +Wy doy alert] (Wn ) 


X[k] = >) 2[2r|Wa" +Ww* )_ 2[2r + Wa 


where Sa z[2r| Wx aie: + -point DFT of even samples ( G[k]) and 
a x|2r + 1|Ww Ar is 4 point DFT of odd samples ( H[k]). 
Vk,0O<k< N—1: (X[k] = G[k] + Wy*H[k]) 
Decomposition of an N-point DFT as a sum of 2 * point DFTs. 


Why would we want to do this? Because it is more efficient! 


Note:Cost to compute an N-point DFT is approximately N? complex 
mults and adds. 


But decomposition into 2 + -point DFTs + combination requires only 


NY? of NY? N? 
se ae N = —4+N 
(x) +(2) += 4 
where the first part is the number of complex mults and adds for + point 


DFT, Gk]. The second part is the number of complex mults and adds for ~ 


2 


-point DFT, Hk]. The third part is the number of complex mults and adds 
for combination. And the total is x + N complex mults and adds. 


Example: 
Savings 
For NV = 1000, 
N* = 10° 
N? 102 
— +N = — +1000 
2 2 


Because 1000 is small compared to 500,000, 


N2 10° 
2 D 


So why stop here?! Keep decomposing. Break each of the + -point DFTs 


into two + -point DFTs, etc., .... 


We can keep decomposing: 


where 
m = log, N = times 
Computational cost: V-pt DFT two + -pt DFTs. The cost is 


N? > 2(+) : + N. So replacing each +-pt DFT with two 7 -pt DFTs 
will reduce cost to 


N\* N N\?* N? N? 
2, 2{ — — N=4| — 2N = — +2N = — N 
(2) +z). (Z) + ae syne 
As we keep going p = {3,4,...,m}, where m = log, N. We get the cost 
N?2 N?2 


N + N log, N is the total number of complex adds and mults. 


For large N, cost ~ N log, N or" O(N log, N)", since N logs N >> N 
for large N. 


Image not finished 


N = 8 point FFT. 
Summing nodes 
W,,* twiddle 
multiplication 
factors. 


Note: Weird order of time samples 


Image not finished 


This is called 
"butterflies." 
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Matlab Example 

Hold operation 

system view 
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Why sample? 


This section introduces sampling. Sampling is the necessary fundament for 
all digital signal processing and communication. Sampling can be defined 
as the process of measuring an analog signal at distinct points. 


Digital representation of analog signals offers advantages in terms of 


robustness towards noise, meaning we can send more bits/s 

use of flexible processing equipment, in particular the computer 
more reliable processing equipment 

easier to adapt complex algorithms 


Claude E. Shannon 


Claude 
Elwood 
Shannon 

(1916-2001) 


Claude Shannon has been called the father of information theory, mainly 
due to his landmark papers on the !"Mathematical theory _of 


in 1928, but it was not proven until Shannon proved it 21 years later in the 
paper "Communications in the presence of noise". 


Notation 
In this chapter we will be using the following notation 


e Original analog signal x(t) 

e Sampling frequency F’, 

e Sampling interval T,, (Note that: PF’, = =) 

e Sampled signal x,(n). (Note that x,(n) = x(nT;)) 
¢ Real angular frequency 2 

e Digital angular frequency w. (Note that: w = (277) 


The Sampling Theorem 


Note: When sampling an analog signal the sampling frequency must be 
greater than twice the highest frequency component of the analog signal to 
be able to reconstruct the original signal from the sampled version. 


Proof 


Note: In order to recover the signal x(t) from it's samples exactly, it is 
necessary to sample x(t) at a rate greater than twice it's highest frequency 
component. 


Introduction 


As mentioned earlier, sampling is the necessary fundament when we want 
to apply digital signal processing on analog signals. 


Here we present the proof of the sampling theorem. The proof is divided in 
two. First we find an expression for the spectrum of the signal resulting 
from sampling the original signal x(t). Next we show that the signal x(t) 
can be recovered from the samples. Often it is easier using the frequency 
domain when carrying out a proof, and this is also the case here. 

Key points in the proof 


e We find an equation for the spectrum of the sampled signal 
e We find a simple method to reconstruct the original signal 
e The sampled signal has a periodic spectrum... 

e ...and the period is 2 x nF, 


Proof part 1 - Spectral considerations 


By sampling x(t) every T, second we obtain x,(n). The inverse fourier 
transform of this time discrete signal is 
Equation: 


x(n) = = | X,(e)e" dw 


—= 7G 


For convenience we express the equation in terms of the real angular 
frequency {2 using w = T,. We then obtain 
Equation: 


T 


T; i XxX, (er je dQ 


On 


x(n) 


The inverse fourier transform of a continuous signal is 
Equation: 


1 i 
a(t) = / X(iMe! 4.2 


From this equation we find an expression for x (nT; ) 
Equation: 


1 on ; 
(nT) = 5 / X(iDei"™ 4.2 


To account for the difference in region of integration we split the integration 
in [link] into subintervals of length ae and then take the sum over the 
resulting integrals to obtain the complete area. 

Equation: 


(2k+1)7 


2xtk 


Then we change the integration variable, setting (2 = 7 + a 


Equation: 


412 xTkn 


We obtain the final form by observing that e 
eeas T 
and multiplying by + 


= /, reinserting 7 = 2 


Equation: 
7. (a SA 2xnk\\ ,; 
Pe S. =X ;(Q4 inl, gq QC 
ake ) 27 - b= 66 Ts (i ( Ts ))e 


To make z,(n) = x(nT;) for all values of n, the integrands in [link] and 
[link] have to agreee, that is 


Equation: 
iT. 1 2rk 
KM) =F De x( (2+ 7p )) 


$ k=—oo 


This is a central result. We see that the digital spectrum consists of a sum of 
shifted versions of the original, analog spectrum. Observe the periodicity! 


We can also express this relation in terms of the digital angular frequency 


GT, 
Equation: 


sy 


S$ k=—o00 


This concludes the first part of the proof. Now we want to find a 
reconstruction formula, so that we can recover x(t) from x,(n). 


Proof part II - Signal reconstruction 


For a bandlimited signal the inverse fourier transform is 
Equation: 


1 fr 
Ce d X(iMei a2 
27 7. 


T. 


X(i2) 


s 


In the interval we are integrating we have: X,(e"7") = 
Substituting this relation into [link] we get 


Equation: 


A= T's pe X, (cic a 2 


Using the DTFT relation for X, (e177) we have 
Equation: 


Tt. fe = e . 
z(t) = [. S° a,(n)e Prt) ei q Q 


Ts; T=—00 


Interchanging integration and summation (under the assumption of 
convergence) leads to 
Equation: 


Finally we perform the integration and arrive at the important 
reconstruction formula 
Equation: 


(Thanks to R.Loos for pointing out an error in the proof.) 


Summary 


Illustrations 


In this module we illustrate the processes involved in sampling and 
reconstruction. To see how all these processes work together as a whole, 
take a look at the system view. In Sampling and reconstruction with Matlab 
we provide a Matlab script for download. The matlab script shows the 
process of sampling and reconstruction live. 


Basic examples 


Example: 
To sample an analog signal with 3000 Hz as the highest frequency 
component requires sampling at 6000 Hz or above. 


Example: 

The sampling theorem can also be applied in two dimensions, i.e. for 
image analysis. A 2D sampling theorem has a simple physical 
interpretation in image analysis: Choose the sampling interval such that it 
is less than or equal to half of the smallest interesting detail in the image. 


The process of sampling 


We start off with an analog signal. This can for example be the sound 
coming from your stereo at home or your friend talking. 


The signal is then sampled uniformly. Uniform sampling implies that we 
sample every 7’, seconds. In [link] we see an analog signal. The analog 
signal has been sampled at times t = n7T’,. 


Analog signal, samples are marked with dots. 


In signal processing it is often more convenient and easier to work in the 
frequency domain. So let's look at at the signal in frequency domain, [link]. 
For illustration purposes we take the frequency content of the signal as a 
triangle. (If you Fourier transform the signal in [link] you will not get such 
a nice triangle.) 


= 
ot 


The spectrum X (i). 


Notice that the signal in [link] is bandlimited. We can see that the signal is 
bandlimited because X(i{2) is zero outside the interval |[—Q,, 2g]. 
Equivalentely we can state that the signal has no angular frequencies above 


§2,, corresponding to no frequencies above F’, = 5°. 

Now let's take a look at the sampled signal in the frequency domain. While 
proving the sampling theorem we found the the spectrum of the sampled 
signal consists of a sum of shifted versions of the analog spectrum. 
Mathematically this is described by the following equation: 


Equation: 
~~ 2 
> *((era)) 
k=—0o Ts 


X, i) _ 


s+ 


Sampling fast enough 


In [link] we show the result of sampling x(t) according to the sampling 
theorem. This means that when sampling the signal in [link]/[link] we use 
F, > 2F,. Observe in [link] that we have the same spectrum as in [link] for 
2 € |-Q,4, 2], except for the scaling factor ra This is a consequence of 


the sampling frequency. As mentioned in the proof the spectrum of the 


sampled signal is periodic with period 27 Ff, = a 


The spectrum X ,. Sampling frequency is OK. 


So now we are, according to the sample theorem, able to reconstruct the 
original signal exactly. How we can do this will be explored further down 
under reconstruction. But first we will take a look at what happens when we 
sample too slowly. 


Sampling too slowly 


If we sample x(t) too slowly, that is FP’, < 2F',, we will get overlap 
between the repeated spectra, see [link]. According to [link] the resulting 
spectra is the sum of these. This overlap gives rise to the concept of 
aliasing. 


Note: If the sampling frequency is less than twice the highest frequency 
component, then frequencies in the original signal that are above half the 
sampling rate will be "aliased" and will appear in the resulting signal as 
lower frequencies. 


The consequence of aliasing is that we cannot recover the original signal, so 
aliasing has to be avoided. Sampling too slowly will produce a sequence 
x(n) that could have orginated from a number of signals. So there is no 
chance of recovering the original signal. To learn more about aliasing, take 
a look at this module. (Includes an applet for demonstration!) 


2m 4n (2 
Ts Ts 


The spectrum X ,. Sampling frequency is too low. 


To avoid aliasing we have to sample fast enough. But if we can't sample fast 
enough (possibly due to costs) we can include an Anti-Aliasing filter. This 
will not able us to get an exact reconstruction but can still be a good 
solution. 


Note: Typically a low-pass filter that is applied before sampling to ensure 
that no components with frequencies greater than half the sample 
frequency remain. 


Example: 

The stagecoach effect 

In older western movies you can observe aliasing on a stagecoach when it 
starts to roll. At first the spokes appear to turn forward, but as the 
stagecoach increase its speed the spokes appear to turn backward. This 
comes from the fact that the sampling rate, here the number of frames per 
second, is too low. We can view each frame as a sample of an image that is 
changing continuously in time. (Applet illustrating the stagecoach effect) 


Reconstruction 


Given the signal in [link] we want to recover the original signal, but the 
question is how? 


When there is no overlapping in the spectrum, the spectral component given 
by k = 0 (see [link]),is equal to the spectrum of the analog signal. This 
offers an oppurtunity to use a simple reconstruction process. Remember 
what you have learned about filtering. What we want is to change signal in 
[link] into that of [link]. To achieve this we have to remove all the extra 
components generated in the sampling process. To remove the extra 
components we apply an ideal analog low-pass filter as shown in [link] As 
we see the ideal filter is rectangular in the frequency domain. A rectangle in 
the frequency domain corresponds to a sinc function in time domain (and 
vice versa). 


|7(2)| 


H(iQ2) The ideal reconstruction filter. 


Then we have reconstructed the original spectrum, and as we know if two 
signals are identical in the frequency domain, they are also identical in 
the time domain. End of reconstruction. 


Conclusions 


The Shannon sampling theorem requires that the input signal prior to 
sampling is band-limited to at most half the sampling frequency. Under this 
condition the samples give an exact signal representation. It is truly 
remarkable that such a broad and useful class signals can be represented 
that easily! 


We also looked into the problem of reconstructing the signals form its 
samples. Again the simplicity of the principle is striking: linear filtering by 
an ideal low-pass filter will do the job. However, the ideal filter is 
impossible to create, but that is another story... 


Sampling and reconstruction with Matlab 


Matlab files 
Samprecon.m 


Exercises 


Systems view of sampling and reconstruction 


Ideal reconstruction system 


[link] shows the ideal reconstruction system based on the results of the 
Sampling theorem proof. 


[link] consists of a sampling device which produces a time-discrete 
sequence x,(n). The reconstruction filter, h(t), is an ideal analog sinc 


filter, with h(t) = sinc (+ . We can't apply the time-discrete sequence 


x(n) directly to the analog filter h(t). To solve this problem we turn the 
sequence into an analog signal using delta functions. Thus we write 


w(t) = >) ss Bal )OlE— nT), 


x(t) 2,(1) x(t) a(t) 


Ideal reconstruction system 


But when will the system produce an output 7(t) = x(t)? According to the 
sampling theorem we have x(t) = x(t) when the sampling frequency, F’,, 
is at least twice the highest frequency component of x(t). 


Ideal system including anti-aliasing 


To be sure that the reconstructed signal is free of aliasing it is customary to 
apply a lowpass filter, an anti-aliasing filter, before sampling as shown in 
[link]. 


s(t) #(t) 


Ideal reconstruction system with anti-aliasing filter 


Again we ask the question of when the system will produce an output 


x(t) = s(t)? If the signal is entirely confined within the passband of the 
lowpass filter we will get perfect reconstruction if F’, is high enough. 


But if the anti-aliasing filter removes the "higher" frequencies, (which in 
fact is the job of the anti-aliasing filter), we will never be able to exactly 
reconstruct the original signal, s(t). If we sample fast enough we can 
reconstruct x(t), which in most cases is satisfying. 


The reconstructed signal, #(t), will not have aliased frequencies. This is 
essential for further use of the signal. 


Reconstruction with hold operation 


To make our reconstruction system realizable there are many things to look 
into. Among them are the fact that any practical reconstruction system must 
input finite length pulses into the reconstruction filter. This can be 
accomplished by the hold operation. To alleviate the distortion caused by 
the hold opeator we apply the output from the hold device to a compensator. 
The compensation can be as accurate as we wish, this is cost and 
application consideration. 


x(t) z,(n) a(t) 
Sampling Hold Compensate ——- 


More practical reconstruction system with a hold component 


By the use of the hold component the reconstruction will not be exact, but 
as mentioned above we can get as close as we want. 


applet Exercises 


Sampling CT Signals: A Frequency Domain Perspective 


Understanding Sampling in the Frequency Domain 


We want to relate x,(t) directly to z[n]. Compute the CTFT of 


CO 


z,(t)= S_ 2-(nT)6(t — nT) 
Equation: 
X(2) = | ee Pog Le(nT)A(t — nT ye) dt 
ye Ze(nT) [. 5(t — nT) ec O™ dt 


yp oeemle 


= De. alnem 


n=—Cwo 


= X(w) 


where w = 2T and X(w) is the DTFT of x[n]. 


Note: 


Equation: 
X(w) = EYP. XelP— bMs) 
= FDR Xp) 


where this last part is 27-periodic. 


Sampling 


wy EX 
xiny med rary a X() 
DTET Je 


Example: 
Speech 
Speech is intelligible if bandlimited by a CT lowpass filter to the band +4 
kHz. We can sample speech as slowly as ? 
é ~tehe Feuer f 
Testine 


- 2 CTFT ( Xs(t)) 

= dene g [eehe multpl 

fem “teh le “ah t Pe oF ty | 
- Jet Nearf 


X (w) = DTFT(xcHI) 


~21T te fe} wv ys a a 1D 


Note that there is no mention of T or 12,! 


Relating x[n] to sampled x(t) 


Recall the following equality: 


Time Sseovency 


xe (4) = F xd 6-07) X(2) 


a is i ae 
ain Freg, O%'S 
[sta > [ett 4x) 


Recall the CTFT relation: 
Equation: 


xX 


1 
a 


r(at) > = 


where aq is a scaling of time and J is a scaling in frequency. 
Equation: 


X,(Q) = X(RT) 


The DFT: Frequency Domain with a Computer Analysis 


Introduction 


We just covered ideal (and non-ideal) (time) sampling of CT signals. This 
enabled DT signal processing solutions for CT applications ({link]): 


x[n] y [n] 
0 SS a] om 


Much of the theoretical analysis of such systems relied on frequency 
domain representations. How do we carry out these frequency domain 
analysis on the computer? Recall the following relationships: 


where w and {2 are continuous frequency variables. 


Sampling DTFT 


Consider the DTFT of a discrete-time (DT) signal z|n]. Assume x(n] is of 
finite duration JN (i.e., an V-point signal). 
Equation: 


where X(w) is the continuous function that is indexed by the real-valued 
parameter —7 < w < 7. The other function, z[n], is a discrete function that 


is indexed by integers. 


We want to work with X(w) on a computer. Why not just sample X (w)? 
Equation: 


X[k] = X(4tk 


In [link] we sampled at w = an where k = {0,1,..., NM — 1} and X[k] 
for k = {0,..., N — 1} is called the Discrete Fourier Transform (DFT) 
of x(n]. 


Example: 

Finite Duration DT Signal 

[missing resource: sec8_fig2.png] 

The DTFT of the image in [link] is written as follows: 
Equation: 


N-1 


X(w) = > a[njeer 


n=0 


where w is any 27-interval, for example —m7 <w < 7. 

Sample X(@) 

[missing resource: sec8_fig3.png] 

where again we sampled at w = tk where k = {0,1,..., M — 1}. For 
example, we take 


il —= IK) 


. In the following section we will discuss in more detail how we should 
choose M, the number of samples in the 27 interval. 
(This is precisely how we would plot X (w) in Matlab.) 


Choosing M 


Case 1 


Given N (length of z[n]), choose M >> N to obtain a dense sampling of 
the DTFT ([(link]): 


Case 2 

Choose WM as small as possible (to minimize the amount of computation). 

In general, we require MZ > N in order to represent all information in 
0,7 =10i204 NN —1}s(e}n)) 

Let's concentrate on M = N: 


zin] <> X[K 


forn = {0,...,N —1} andk = {0,...,N—1} 
numbers ++ N numbers 
Discrete Fourier Transform (DFT) 


Define 
Equation: 


where NV = length (xz[n]) andk = {0,..., N — 1}. In this case, M = N. 
Equation: 
DFT 


Equation: 
Inverse DFT (IDFT) 


N-1 
X|k je2™wr 
k=0 


1 
N 


Interpretation 


Represent z[n] in terms of a sum of NV complex sinusoids of amplitudes 
X[k] and frequencies 


2 
Vk, k € {0,....N—1}: («1 = +) 


Note: Fourier Series with fundamental frequency at 


Remark 1 


IDFT treats x|n] as though it were N-periodic. 


Equation: 


where n € {0,..., NM — 1} 
Exercise: 


Problem: What about other values of n? 
Solution: 


zi[n +N] =??? 


Remark 2 


Proof that the IDFT inverts the DFT for n € {0,...,N — 1} 
Equation: 
W Dba Xikle*™*" = 4 ko Dima alle Pram etn 
229 


Example: 
Computing DFT 
Given the following discrete-time signal ([link]) with NV = 4, we will 


compute the DFT using two different methods (the DFT Formula and 
Sample DTFT): 


x[n] 


1. DFT Formula 
Equation: 


1 4 e(-A)2nz 4. e(-A)2nq2 4. e(-#)2nz3 


1tel)rk 4 e(-dtk 4 @(-i)p7k 


Using the above equation, we can solve and get the following results: 


x|0| =4 
x(1|—0 
x|2)— 0 
T\s\p—=.0) 


2. Sample DTFT. Using the same figure, [link], we will take the DTFT 
of the signal and get the following equations: 


Equation: 
eo) = eo 

= 1—e(-t4e 

— 1—e(-#)w 

eas 
Our sample points will be: 

27k 
w, = 2m _ Ty 


where & = {0, 1, 2, 3} ([link]). 
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Periodicity of the DFT 


DFT X[k] consists of samples of DTFT, so X(w), a 27-periodic DTFT 
signal, can be converted to X[k], an N-periodic DFT. 
Equation: 


N-1 
X|k] = S- a[nje—0?7 wr 


n=0 


where e(-#)2™v" is an N -periodic basis function (See [link]). 


N Samples N Samples 


Also, recall, 
Equation: 


w[n] = a Dna X[hle?r¥” 
= W Dnco X[klewertmn) 


fag 


Example: 
Illustration 


Note: When we deal with the DFT, we need to remember that, in effect, 
this treats the signal as an V-periodic sequence. 


A Sampling Perspective 


Think of sampling the continuous function X(w), as depicted in [link]. 
S(w) will represent the sampling function applied to X(w) and is illustrated 
in [link] as well. This will result in our discrete-time sequence, X[k]. 


N= points 
Cl pevied ) 


Note: Remember the multiplication in the frequency domain is equal to 
convolution in the time domain! 


Inverse DTFT of S(@) 


Equation: 


Given the above equation, we can take the DTFT and get the following 
equation: 
Equation: 


N 3 d[n — mN] = S[n] 


M=—oO 
Exercise: 
Problem: Why does [link] equal $[n]? 
Solution: 


S|n] is N-periodic, so it has the following Fourier Series: 
Equation: 


ee 2 (—i)2r74n 
ce = wn d[nle vn" dn 
2 

a le 

— oN 
Equation: 

Sin] = el? aryn 
k=—0o 

where the DTFT of the exponential in the above equation is equal to 
5(w — =). 


So, in the time-domain we have ((link]): 


Nexin] 


Connections 


Fouriey Series 


XG) Discrete : 


Combine signals in [link] to get signals in [link]. 


Discrete + Periodic 
in Freq 
XxX) DET 


Discrete-Time Processing of CT Signals 


DT Processing of CT Signals 


DSP System 


Hanalei to 
diqival i duatog ” 


Analysis 
Equation: 
Y.(Q2) = Hyp(Q)Y (NT) 
where we know that Y(w) = X(w)G(w) and G(w) is the frequency 
response of the DT LTI system. Also, remember that 
w= OT 
50, 


Equation: 


Y.(Q) = Hrp(Q)G(AT)X(AT) 


where Y,.(2) and Hyp(2) are CTFTs and G(T) and X(T) are DTFTs. 


Note: 


OR 


Therefore our final output signal, Y.(2), will be: 
Equation: 


Y.(22) = Ayp(2)G(LT) (= 3 X(2 _ K2.) 


ae 
reconstruction filter in the D/A, [link]: 


Now, if X-(2) is bandlimited to - ose and we use the usual lowpass 


[missing_resource: sec9_fig2.png] 


Then, 
Equation: 


Y,(Q) = eniiarny if |Q| < & 


0 otherwise 


Summary 


For bandlimited signals sampled at or above the Nyquist rate, we can relate 
the input and output of the DSP system by: 
Equation: 


where 


P 2; 
G(OT) if |Q\< 2 


0 otherwise 


Gere(Q2) = | 


Note 


Gete(2) is LTI if and only if the following two conditions are satisfied: 


1. G(w) is LTI (in DT). 
2. X(T) is bandlimited and sampling rate equal to or greater than 
Nyquist. For example, if we had a simple pulse described by 


X.(t) = u(t — Ty) — u(t — T)) 


where 7 > To. If the sampling period 7’ > 7 — To, then some 
samples might "miss" the pulse while others might not be "missed." 
This is what we term time-varying behavior. 


Example: 


If se > 2B andw, < BT, determine and sketch Y,() using [link]. 


Application: 60Hz Noise Removal 


EKG Voltage Signal 


Unfortunately, in real-world situations electrodes also pick up ambient 60 
Hz signals from lights, computers, etc.. In fact, usually this "60 Hz noise" is 
much greater in amplitude than the EKG signal shown in [link]. [link] 
shows the EKG signal; it is barely noticeable as it has become 
overwhelmed by noise. 


n x(t) 


Our EKG signal, y(t), is 
overwhelmed by noise. 


DSP Solution 


x(t) —|y>[[ © > D/;|-—> Ft) 


noisy EKG DT LTT ——— 
"Notch Filter" sinned 


[missing resource: sec9_fig7b.png] 


Sampling Period/Rate 


First we must note that |Y(.2)| is bandlimited to +60 Hz. Therefore, the 
minimum rate should be 120 Hz. In order to get the best results we should 
set 


f; = 240 Hz 


O-2 On (240=*) 
S 


[missing resource: sec9_fig8.png] 


Digital Filter 


Therefore, we want to design a digital filter that will remove the 60Hz 
component and preserve the rest. 


Short Time Fourier Transform 


Short Time Fourier Transform 


The Fourier transforms (FT, DTFT, DFT, etc.) do not clearly indicate how 
the frequency content of a signal changes over time. 


That information is hidden in the phase - it is not revealed by the plot of the 
magnitude of the spectrum. 


Note: To see how the frequency content of a signal changes over time, we 
can cut the signal into blocks and compute the spectrum of each block. 


To improve the result, 


1. blocks are overlapping 
2. each block is multiplied by a window that is tapered at its endpoints. 


Several parameters must be chosen: 


¢ Block length, R. 

e The type of window. 

e Amount of overlap between blocks. ({link]) 
e Amount of zero padding, if any. 


STFT: Overlap Parameter 


NO OVERLAP 


-— DFT hora 


R/4 OVERLAP 


— DFT 1—+ 
*— DFT 2—+| 
+— DFT 3—+| 
+— DFT 4 —+ 


DFT 3 
DFT 4—+| 


R/2 OVERLAP 


|-— DFT 1 —+ 
ee 
-— DFT 4 —+| 


The parameter L 


L is the number of samples between adjacent blocks. 


The short-time Fourier transform is defined as 
Equation: 
X(w,m) = STFT (a2(n)) = DTFT (2(n — m)w(n)) 
ooo &(n — m)w(nje~ Hr) 


no &(n — m)w(n)je“ 


| 


where w(n) is the window function of length R. 


1. The STFT of a signal x(n) is a function of two variables: time and 
frequency. 

2. The block length is determined by the support of the window function 
w(n). 

3. A graphical display of the magnitude of the STFT, | X(w, m) 
the spectrogram of the signal. It is often used in speech processing. 

4. The STFT of a signal is invertible. 

5. One can choose the block length. A long block length will provide 
higher frequency resolution (because the main-lobe of the window 


, is called 


function will be narrow). A short block length will provide higher time 
resolution because less averaging across samples is performed for each 


STFT value. 


6. A narrow-band spectrogram is one computed using a relatively long 


block length R, (long window function). 
7. A wide-band spectrogram is one computed using a relatively short 
block length R, (short window function). 


Sampled STFT 


To numerically evaluate the STFT, we sample the frequency axis w in NV 
equally spaced samples from w = 0 to w = 27. 
Equation: 


2 
vk,0 << N—1: (wn = 7h) 


We then have the discrete STFT, 
Equation: 


X4(k,m) = X(22k,m) = Su) a(n — m)w 


where 0,...0 is N — R. 


In this definition, the overlap between adjacent blocks is R — 1. The signal 
is shifted along the window one sample at a time. That generates more 
points than is usually needed, so we also sample the STFT along the time 
direction. That means we usually evaluate 


X%(k, Lm) 


where L is the time-skip. The relation between the time-skip, the number of 
overlapping samples, and the block length is 


Overlap = R— L 


Exercise: 


Problem: Match each signal to its spectrogram in [link]. 


SS == = 
—— = ==. 


- wn oO WwW Kf fF He CO HH fr fe He CO KH fr fF KH CO WH 
3 ra ra 3 


SPECTAOGAAM A 


50 100 150 
time 


o 


SPECTAOGAAM C 


50 100 150 
time 


o 


Solution: 


Spectrogram Example 


SPECTAOGRAM B 


time 


SPECTAOGAAM D 


o 


500 1000 1500 2000 2500 3000 3500 4000 
n 


SPECTROGRANM, R = 256 


0 0.0 01 0.15 02 0.25 03 035 04 045 oS 
time 


The matlab program for producing the figures above ({link] and [link]). 


% LOAD DATA 
load mtlb; 
x = mtlb; 


figure(1), clf 
plot(0:4000, x) 
Xlabel('n') 

ylabel('x(n)') 


% SET PARAMETERS 

R = 256; % R: block length 

window = hamming(R); % window function 
of length R 


N = 512; % N: frequency 
discretization 

L. = 35; % L: time lapse 
between blocks 

fs = 7418; % fS: sampling 


frequency 
overlap = R - L; 


% COMPUTE SPECTROGRAM 
[B,f,t] = 
specgram(x,N, fs,window, overlap); 


% MAKE PLOT 

figure(2), clf 
imagesc(t,f,1log10(abs(B))); 
colormap( 'jet') 

axis xy 

Xlabel('time' ) 

ylabel( 'frequency' ) 
title('SPECTROGRAM, R = 256') 


Effect of window length R 


Narrow-band spectrogram: better frequency resolution 
SPECTROGRAM, R = 512 


0 0.0 01 0.15 02 025 03 035 04 04S 
time 


Wide-band spectrogram: better time resolution 
SPECTROGRAM, R = 128 


0 0.05 01 0.15 02 0.2 03 Os o4 04S os 
time 


Here is another example to illustrate the frequency/time resolution trade-off 
(See figures - [link], [link], and [link]). 
Effect of Window Length R 


SIGNAL 
te 


0 20 40 60 &0 


BLOGK LENGTH A = 10 


frequency 


0 20 40 60 8&0 
time 


BLOCK LENGTH A =45 


o 


20 40 60 &0 


Effect of L and N 


100 120 140 160 180 200 


BLOGK LENGTH A =30 


0.5 
0.4 
20.3 
a 
=| 
Boe 
0.4 
i) 
i) 20 840 60 80 


time 


BLOCK LENGTH A= 110 


0.5 

0.4 

0.3 
§ 

Foe 

0.1 

i) 

i) 20 «40 60 80 


time 


A spectrogram is computed with different parameters: 


Pet, 10) 


N € {32, 256} 


e [= time lapse between blocks. 
e N =FFT length (Each block is zero-padded to length NV.) 


In each case, the block length is 30 samples. 
Exercise: 


Problem: 


For each of the four spectrograms in [link] can you tell what ZL and N 


are? 
10 


8 


6 


SIGNAL 


-2 


SPECTROGRAM 4 
05 
04 
20.3 
a 
3 
Bop 
0.1 
0 
0 50 100 150 
time 
SPECTROGRAM C 
05 
04 
S03 
§ 
Foz 
0.1 
0 
0 50 100 150 
time 
Solution: 


SPECTAOGAAM B 


0 50 100 150 
time 


SPECTAOGAAM D 


150 


o 
17a] 
o 
_ 
o 
oS 


L and N do not effect the time resolution or the frequency resolution. They 


only affect the 'pixelation’. 


Effect of R and L 


Shown below are four spectrograms of the same signal. Each spectrogram 
is computed using a different set of parameters. 


R & {120, 256, 1024} 


L € {35,250} 


where 


e R= block length 
e [ = time lapse between blocks. 


Exercise: 


Problem: 


For each of the four spectrograms in [link], match the above values of 
Land R. 


(4) SPECTAROGAAM, A =?,L=? (B) SPECTROGAAM, A= ?, L=? 
1 


0 0 
0 1000 «=6©2000 «€6©3000)8=6—4000 0 1000 »«=©2000) §€6©63000)6=—s« 4000 
time time 
(C) SPECTAOGAAM, A = ?,L = ? (D) SPECTROGAAN, A = ?,L=? 
1 


0 0 
0 1000 2000 3000 8 4000 0 1000 2000 3000 4000 
time time 


Solution: 


If you like, you may listen to this signal with the Soundsc command; the 
data is in the file: stft_data.m. Here is a figure of the signal. 
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Filtering with the DFT 


Introduction 


xo) yor) 


Equation: 
yln]| = aln|*h|[n] 


Equation: 


Assume that H(w) is specified. 
Exercise: 


Problem: How can we implement X(w)H(w) in a computer? 


Solution: 


Discretize (sample) X(w) and H(w). In order to do this, we should 
take the DFTs of z[n] and h[n] to get X[k] and X[k]. Then we will 
compute 


g{n] = IDFT (X|k]H[A)) 


Does y[n] = y|n]? 


Recall that the DFT treats N-point sequences as if they are periodically 
extended ([link]): 


xin) 


6 wet n 
fuse ee 
original finite 

duration signal 


Ying = x CCH] 


periodic 
extension 
Compute IDFT of Y[k] 
Equation: 
ain] = a Deo ¥ [ble 
= 7 Dba XUARe*t 


And the IDFT periodically extends h|n]: 


~ 


hin —m] = h{((n — m ))y) 


This computes as shown in [link]: 


h[(())n_] 


xf} 
aA . eatllil tect te 
Q wl n \ jt i ey | n 


periodic 
extension 


Equation: 


is called circular convolution and is denoted by [link]. 


Yi[nJ= x[n] @ h[n] 
The above symbol 
for the circular 
convolution is for an 
N-periodic 
extension. 


DFT Pair 


DFT 
x(n] @ h[n] <——> (x[k]) (H[k]) 


Note that in general: 


x[n] @ h[n] 4 x[n]* h[n] 


Example: 
Regular vs. Circular Convolution 
To begin with, we are given the following two length-3 signals: 


2 — 4 lee oy 
feet 


We can zero-pad these signals so that we have the following discrete 
sequences: 


lt eee Ol DOs eat 
h[n] = {...,0,1,0,2,0,...} 
where z[0] = 1 and h[0] = 1. 


e Regular Convolution: 
Equation: 


Using the above convolution formula (refer to the link if you need a 
review of convolution), we can calculate the resulting value for y[0] to 
y|4]. Recall that because we have two length-3 signals, our convolved 
signal will be length-5. 


on — 0 
fee OSU S028 Oat 
fee Oe OOO Orme 


Equation: 


yo] = 1x14+2x0+4+3x0 
1 
Cea 
Pee ied ad <r 
fee Oe ON eset 
Equation: 
yl] = 1x04+2x1+4+3x0 
2 
oni 
shee Ue Bee es eae 
teen Qed) aae ett 
Equation: 
yf2]) = 1x24+2x04+3x1 
5 
Cnt) 
Equation: 
y|3] =4 
on —A4 
Equation: 
yl4] = 6 


Regular Convolution Result 


Result is finite duration, not 
periodic! 


e Circular Convolution: 
Equation: 


m=0 


And now with circular convolution our h[n] changes and becomes a 
periodically extended signal: 
Equation: 


Al(( 7 )) wl nes tess Oe 2 OF 2 NOR 2 ray 


he (0) 
fore, LO SUMO keds ular 
(reel eel Oe loan 
Equation: 
gO] = 1x14+2x24+3x0 
5 
on dll 


(neo sO 2 oan) 


eed Valle Olen hen 


Equation: 
gl) = lx 12 < 1-3 x2 
8 
47 = Dy: 
Equation: 
y|2] = 5 
on=3 
Equation: 
y|3] = 5 
oo) 47), = 4 
Equation: 
yl4] = 8 


Circular Convolution Result 


Result is 3-periodic. 


[link] illustrates the relationship between circular convolution and regular 
convolution using the previous two figures: 
Circular Convolution from Regular 


The left plot (the circular 
convolution results) has a 
"wrap-around" effect due to 
periodic extension. 


Regular Convolution from Periodic Convolution 


1. ""Zero-pad" x[n| and h|n] to avoid the overlap (wrap-around) effect. 
We will zero-pad the two signals to a length-5 signal (5 being the 
duration of the regular convolution result): 


iin) = 41.2.3, 0-0} 
h(n] = {1, 0, 2,0, 03 


2. Now take the DFTs of the zero-padded signals: 
Equation: 


gin) = 4 i-o X[K H[ke?™" 
= Yio 2[mh[((n-m))s] 


Now we can plot this result ((link]): 


No Image. 


The sequence 
from 0 to 4 
(the 
underlined 
part of the 
sequence) is 
the regular 
convolution 
result. From 
this 
illustration 
we Can see 
that it is 5- 

periodic! 


Note: We can compute the regular convolution result of a convolution of 
an M-point signal x|n] with an N-point signal h|n] by padding each 
signal with zeros to obtain two M + N — 1 length sequences and 
computing the circular convolution (or equivalently computing the IDFT 
of H|k|X|k], the product of the DFTs of the zero-padded signals) ({link]). 


8 n- ; a sail Ie Mrhel on 
= 7 wv? 
a paling sore peddng 


Note that the lower two images are simply the 
top images that have been zero-padded. 


DSP System 


No Image. 


The system 
has a length 
N impulse 
response, 
hin| 


1. Sample finite duration continuous-time input x(t) to get x|n] where 


= 405s: 224d = 1}. 
2. Zero-pad x[n] and h|n] to length M+ N — 1. 
3. Compute DFTs X[k] and Hk] 
4. Compute IDFTs of X[k]. H|k] 


y[n] = gn 


where n = {0,..., M+ N — 1}. 
5. Reconstruct y(t) 


Image Restoration Basics 


Image Restoration 


In many applications (e.g., satellite imaging, medical imaging, astronomical 
imaging, poor-quality family portraits) the imaging system introduces a 
slight distortion. Often images are slightly blurred and image restoration 
aims at deblurring the image. 


The blurring can usually be modeled as an LSI system with a given PSF 
h[m, nj. 


h[m,n] <i H(u,v) 


Fourier Transform (FT) 
relationship between the 
two functions. 


The observed image is 


Equation: 

g(m, n] = h{m, n|* fim, n] 
Equation: 

G(u,v) = H(u, v)F(u, v) 
Equation: 


Example: 

Image Blurring 

Above we showed the equations for representing the common model for 
blurring an image. In [link] we have an original image and a PSF function 
that we wish to apply to the image in order to model a basic blurred image. 


a 


Once we apply the PSF to the original image, we receive our blurred image 
that is shown in [Link]: 


f 
| 


Frequency Domain Analysis 


[link] looks at the original images in its typical form; however, it is often 
useful to look at our images and PSF in the frequency domain. In [link], we 
take another look at the image blurring example above and look at how the 
images and results would appear in the frequency domain if we applied the 
fourier transforms. 


Difference Equation 
(Blank Abstract) 


Introduction 


One of the most important concepts of DSP is to be able to properly represent the input/output 
relationship to a given LTI system. A linear constant-coefficient difference equation (LCCDE) 
serves as a way to express just this relationship in a discrete-time system. Writing the sequence of 
inputs and outputs, which represent the characteristics of the LTI system, as a difference equation help 
in understanding and manipulating a system. 


difference equation 
An equation that shows the relationship between consecutive values of a sequence and the 
differences among them. They are often rearranged as a recursive formula so that a systems 
output can be computed from the input signal and past outputs. 


Example: 
Equation: 


General Formulas for the Difference Equation 


As stated briefly in the definition above, a difference equation is a very useful tool in describing and 
calculating the output of the system described by the formula for a given sample n. The key property 
of the difference equation is its ability to help easily find the transform, H(z), of a system. In the 
following two subsections, we will look at the general form of the difference equation and the general 
conversion to a z-transform directly from the difference equation. 


Difference Equation 


The general form of a linear, constant-coefficient difference equation (LCCDE), is shown below: 
Equation: 


N M 
S > axyln —k)= S 5 bea[n — k] 
k=0 k=0 


We can also write the general form to easily express a recursive output, which looks like this: 
Equation: 


N M 
y(n] = — S axyln —k+ S> bpx[n — k] 
k=l k=0 


From this equation, note that y[n — k] represents the outputs and x|[n — k] represents the inputs. The 
value of N represents the order of the difference equation and corresponds to the memory of the 
system being represented. Because this equation relies on past values of the output, in order to 
compute a numerical solution, certain past outputs, referred to as the initial conditions, must be 
known. 


Conversion to Z-Transform 


Using the above formula, [link], we can easily generalize the transfer function, H(z), for any 
difference equation. Below are the steps taken to convert any difference equation into its transfer 
function, i.e. z-transform. The first step involves taking the Fourier Transform of all the terms in 
[link]. Then we use the linearity property to pull the transform inside the summation and the time- 
shifting property of the z-transform to change the time-shifting terms to exponentials. Once this is 
done, we arrive at the following equation: ag = 1. 


Equation: 
N M 
¥(z)=— So an¥(z)z* + So bg X(2)z* 
k=1 k=0 
Equation: 
Y(z 
He) = x6 
_ Bia bz * 


1+ a2 


Conversion to Frequency Response 


Once the z-transform has been calculated from the difference equation, we can go one step further to 
define the frequency response of the system, or filter, that is being represented by the difference 
equation. 


Note: Remember that the reason we are dealing with these formulas is to be able to aid us in filter 
design. A LCCDE is one of the easiest ways to represent FIR filters. By being able to find the 
frequency response, we will be able to look at the basic properties of any filter represented by a 
simple LCCDE. 


Below is the general formula for the frequency response of a z-transform. The conversion is simple a 
matter of taking the z-transform formula, H(z), and replacing every instance of z with e’”. 
Equation: 

A(w) H(z) | nemei 
See 


2a age (wk) 


I 


Once you understand the derivation of this formula, look at the module concerning Filter Design from 
the Z-Transform for a look into how all of these ideas of the Z-transform, Difference Equation, and 
Pole/Zero Plots play a role in filter design. 


Example 


Example: 

Finding Difference Equation 

Below is a basic example showing the opposite of the steps above: given a transfer function one can 
easily calculate the systems difference equation. 

Equation: 


Eas 
1 3 
(= aa 7) 
Given this transfer function of a time-domain filter, we want to find the difference equation. To begin 


with, expand both polynomials and divide them by the highest order z. 
Equation: 


He) — 


~~ (z+1)(z+1) 

H@) = Tenersy 

ze42z+1 
242z4+1— a 

142271427? 
14fz1-37% 
From this transfer function, the coefficients of the two polynomials will be our a; and 6; values 
found in the general difference equation formula, [link]. Using these coefficients and the above form 
of the transfer function, we can easily write the difference equation: 
Equation: 


alm] + 22[n— 1] +2{n— 2] = yin] + Zyl — 1] — Syn — 3] 


In our final step, we can rewrite the difference equation in its more common form showing the 
recursive nature of the system. 
Equation: 


yn] = a{n] + 22[n —1] + 2[n 2] + yn —1]4 


Solving a LCCDE 


In order for a linear constant-coefficient difference equation to be useful in analyzing a LTI system, 
we must be able to find the systems output based upon a known input, x(n), and a set of initial 
conditions. Two common methods exist for solving a LCCDE: the direct method and the indirect 
method, the later being based on the z-transform. Below we will briefly discuss the formulas for 
solving a LCCDE using each of these methods. 


Direct Method 


The final solution to the output based on the direct method is the sum of two parts, expressed in the 
following equation: 
Equation: 


y(n) = n(n) + Yp(r) 


The first part, yp, (7), is referred to as the homogeneous solution and the second part, yp,(7), is 
referred to as particular solution. The following method is very similar to that used to solve many 
differential equations, so if you have taken a differential calculus course or used differential equations 
before then this should seem very familiar. 


Homogeneous Solution 


We begin by assuming that the input is zero, x(n) = 0. Now we simply need to solve the 
homogeneous difference equation: 
Equation: 


N 
S- axyln —k| =0 
k=0 


In order to solve this, we will make the assumption that the solution is in the form of an exponential. 
We will use lambda, A, to represent our exponential terms. We now have to solve the following 
equation: 

Equation: 


N 
ya =0 
k=0 


We can expand this equation out and factor out all of the lambda terms. This will give us a large 
polynomial in parenthesis, which is referred to as the characteristic polynomial. The roots of this 
polynomial will be the key to solving the homogeneous equation. If there are all distinct roots, then 
the general solution to the equation will be as follows: 

Equation: 


yn(n) = Ci(A1)" + C2(A2)” +...+ Cyn(An)” 


However, if the characteristic equation contains multiple roots then the above general solution will be 
slightly different. Below we have the modified version for an equation where A; has K multiple 
roots: 

Equation: 


yn(n) —s C1(A1)" + Cyn(A1)” + Cyn?(A1)" a aesever ie Cyn*®—"(d1)" + C2(A2)" ae a Cn(An)” 


Particular Solution 


The particular solution, y,(7), will be any solution that will solve the general difference equation: 
Equation: 


N M 
S> aryp(n — k) = S> b,x(n — k) 
k=0 k=0 


In order to solve, our guess for the solution to y,(n) will take on the form of the input, x(n). After 
guessing at a solution to the above equation involving the particular solution, one only needs to plug 
the solution into the difference equation and solve it out. 


Indirect Method 


The indirect method utilizes the relationship between the difference equation and z-transform, 
discussed earlier, to find a solution. The basic idea is to convert the difference equation into a z- 
transform, as described above, to get the resulting output, Y(z). Then by inverse transforming this 
and using partial-fraction expansion, we can arrive at the solution. 

Equation: 


Z{y (n+1) —y(n)} = 2¥ (z) — y (0) 


This can be interatively extended to an arbitrary order derivative as in Equation [link]. 
Equation: 


Now, the Laplace transform of each side of the differential equation can be taken 
Equation: 


N N-1 
Z So ax c (n-m+1)— S~ y(n-m)y(n) 


which by linearity results in 
Equation: 


N 
> ott y (n-m+1) — 
k=0 


and by differentiation properties in 
Equation: 


Rearranging terms to isolate the Laplace transform of the output, 
Equation: 


Z{a(n)} + Vo mao ez ™ yl (0) 


ys az" 


Z{y(n)} = 


Thus, it is found that 
Equation: 


X(z)+ pl 0 ae Gee m—ly(™) (0) 


ae a,z* 


rig\= 


In order to find the output, it only remains to find the Laplace transform X(z) of the input, substitute 
the initial conditions, and compute the inverse Z-transform of the result. Partial fraction expansions 
are often required for this last step. This may sound daunting while looking at [link], but it is often 
easy in practice, especially for low order difference equations. [link] can also be used to determine the 
transfer function and frequency response. 


As an example, consider the difference equation 
Equation: 


y[n-2| + dy [n-2] + 3y [n] =cos (n) 


with the initial conditions y’ (0) = 1 and y(0) = 0 Using the method described above, the Z 
transform of the solution y[n] is given by 
Equation: 


Performing a partial fraction decomposition, this also equals 
Equation: 


1 1 1 
Y [z] =.25 35 ai a, 
cee | z+3 z2+1 z2+1 


Computing the inverse Laplace transform, 
Equation: 


y(n) = (.252°-"—. 35z- "+. 1 cos (n)+.2 sin (n))u(n). 


One can check that this satisfies that this satisfies both the differential equation and the initial 
conditions. 


The Z Transform: Definition 
A brief definition of the z-transform, explaining its relationship with the 
Fourier transform and its region of convergence, ROC. 


Basic Definition of the Z-Transform 


The z-transform of a sequence is defined as 
Equation: 


Sometimes this equation is referred to as the bilateral z-transform. At 
times the z-transform is defined as 
Equation: 


which is known as the unilateral z-transform. 


There is a close relationship between the z-transform and the Fourier 
transform of a discrete time signal, which is defined as 
Equation: 


Notice that that when the z~” is replaced with e~ “”) the z-transform 


reduces to the Fourier Transform. When the Fourier Transform exists, 
z= e™ , which is to have the magnitude of z equal to unity. 


The Complex Plane 


In order to get further insight into the relationship between the Fourier 
Transform and the Z-Transform it is useful to look at the complex plane or 
z-plane. Take a look at the complex plane: 

Z-Plane 


The Z-plane is a complex plane with an imaginary and real axis referring to 
the complex-valued variable z. The position on the complex plane is given 
by re , and the angle from the positive, real axis around the plane is 
denoted by w. X(z) is defined everywhere on this plane. X (ec) on the 
other hand is defined only where |z| = 1, which is referred to as the unit 
circle. So for example, w = 1 at z= 1 andw = wat z = —1. This is useful 
because, by representing the Fourier transform as the z-transform on the 
unit circle, the periodicity of Fourier transform is easily seen. 


Region of Convergence 


The region of convergence, known as the ROC, is important to understand 
because it defines the region where the z-transform exists. The ROC for a 
given x[n] , is defined as the range of z for which the z-transform 
converges. Since the z-transform is a power series, it converges when 
z|n|z~” is absolutely summable. Stated differently, 

Equation: 


Oo 


> |jz[n]z-”] < 00 


n=—CO 


must be satisfied for convergence. This is best illustrated by looking at the 
different ROC's of the z-transforms of a”u|n] and a” u[n — 1]. 


Example: 
For 
Equation: 


z|n| = a”uln] 


z|n| = a”u|n| where a = 0.5. 


Equation: 


This sequence is an example of a right-sided exponential sequence because 
it is nonzero for n > 0. It only converges when Jaz | < 1. When it 
converges, 


Equation: 
X(z) = —— 


If |az~+] > 1, then the series, $>*°_, (az~+)" does not converge. Thus 
the ROC is the range of values where 


Equation: 
laz~*| =< l 
or, equivalently, 
Equation: 
lz] > lo 


ROC for z[n] = a™ul[n] 
where a = 0.5 


Example: 
For 


Equation: 


z|n| = (—a”)u|(—n) — 1] where 
a = 0.5. 


Equation: 


DG —= Se alae 


n=—CO 


= Vr. (-a”)ul-n - 1]z 
Se = ace are? 


a rae (a~*z) a 
ae a) 

= i= Ss ar) 
The ROC in this case is the range of values where 
Equation: 


la*z| ol. 


or, equivalently, 
Equation: 


I2| < lal 


If the ROC is satisfied, then 
Equation: 


Table of Common z-Transforms 


Lists the z-transform and region of convergence (ROC) for several common 
discrete-time signals. 


The table below provides a number of unilateral and bilateral z-transforms. 
The table also specifies the region of convergence. 


Note: The notation for z found in the table below may differ from that 
found in other tables. For example, the basic z-transform of u[n] can be 
written as either of the following two expressions, which are equivalent: 


Equation: 
Zo 1 
z-1 1-271 
Signal Z-Transform ROC 
6[n — k] ze All(z) 
uln] a z}>1 
—u|(—n) — 1] a 2) 2 


na”uln| 


[Tp n—k+1 a 


a™m! 


7” cos(an)u|n] 


y” sin(an)u[n] 


Z-Transform 


z(z+1) 
(z-1)° 


z(z—7y cos(a)) 


z2—(2ycos(a))z+y? 


zysin(a) 


z?—(2ycos(a))z+y? 


Iz] > | 


Iz] > | 


Understanding Pole/Zero Plots on the Z-Plane 

This module will look at the relationships between the z-transform and the 
complex plane. Specifically, the creation of pole/zero plots and some of 
their useful properties are discussed. 


Introduction to Poles and Zeros of the Z-Transform 


It is quite difficult to qualitatively analyze the Laplace transform and Z- 
transform, since mappings of their magnitude and phase or real part and 
imaginary part result in multiple mappings of 2-dimensional surfaces in 3- 
dimensional space. For this reason, it is very common to examine a plot of a 
transfer function's poles and zeros to try to gain a qualitative idea of what a 
system does. 


Once the Z-transform of a system has been determined, one can use the 
information contained in function's polynomials to graphically represent the 
function and easily observe many defining characteristics. The Z-transform 
will have the below structure, based on Rational Functions: 

Equation: 


The two polynomials, P(z) and Q(z), allow us to find the poles and zeros 
of the Z-Transform. 


Zeros 
The value(s) for z where P(z) = 0. 
The complex frequencies that make the overall gain of the filter 
transfer function zero. 


poles 
The value(s) for z where Q(z) = 0. 
The complex frequencies that make the overall gain of the filter 
transfer function infinite. 


Example: 
Below is a simple transfer function with the poles and zeros shown below 
it. 


z+1 


FAG) perme SEVEN 
(z— 3) (2+ 4) 

The zeros are: {—1} 

The poles are: {+ —3} 


The Z-Plane 


Once the poles and zeros have been found for a given Z-Transform, they 
can be plotted onto the Z-Plane. The Z-plane is a complex plane with an 
imaginary and real axis referring to the complex-valued variable z. The 
position on the complex plane is given by re”? and the angle from the 
positive, real axis around the plane is denoted by 8. When mapping poles 
and zeros onto the plane, poles are denoted by an "x" and zeros by an "o". 
The below figure shows the Z-Plane, and examples of plotting zeros and 
poles onto the plane can be found in the following section. 
Z-Plane 

Im(z) 


re!* 


Re(z) 


Examples of Pole/Zero Plots 


This section lists several examples of finding the poles and zeros of a 
transfer function and then plotting them onto the Z-Plane. 


Example: 
Simple Pole/Zero Plot 


The zeros are: {0} 
The poles are: { > = +} 
Pole/Zero Plot 

Im (z) 


Re(z) 


Using the zeros and 
poles found from the 
transfer function, the 
one zero is mapped to 

zero and the two 
poles are placed at 5 


3 
and — 7 
Example: 
Complex Pole/Zero Plot 
OVE (z — 7) (z+i) 


The zeros are: {i, —7} 


The poles are: {-1, = ee = = 


Fe 
Pole/Zero Plot 


O 
y 


Using the zeros and 
poles found from the 
transfer function, the 
zeros are mapped to 
+(2), and the poles 
are placed at —1, 


il ie 1 ; 
rie ens 1 


tole 


Example: 
Pole-Zero Cancellation 
An easy mistake to make with regards to poles and zeros is to think that a 


function like ee is the same as s + 3. In theory they are equivalent, 


as the pole and zero at s = 1 cancel each other out in what is known as 
pole-zero cancellation. However, think about what may happen if this 
were a transfer function of a system that was created with physical circuits. 
In this case, it is very unlikely that the pole and zero would remain in 
exactly the same place. A minor temperature change, for instance, could 
cause one of them to move just slightly. If this were to occur a tremendous 
amount of volatility is created in that area, since there is a change from 
infinity at the pole to zero at the zero in a very small range of signals. This 


is generally a very bad way to try to eliminate a pole. A much better way is 
to use control theory to move the pole to a better place. 


Note: 

Repeated Poles and Zeros 

It is possible to have more than one pole or zero at any given point. For 
instance, the discrete-time transfer function H(z) = z? will have two zeros 
at the origin and the continuous-time function H(s) = 7 will have 25 
poles at the origin. 


MATLAB - If access to MATLAB is readily available, then you can use its 
functions to easily create pole/zero plots. Below is a short program that 
plots the poles and zeros from the above example onto the Z-Plane. 


% Set up vector for zeros 
ZN eI 

% Set up vector for poles 
p= [-1;, .5+.5j) ; .5-.5)]; 


figure(1); 

zplane(z,P); 

title('Pole/Zero Plot for Complex 
Pole/Zero Plot Example'); 


Interactive Demonstration of Poles and Zeros 


:Pole-ZeroDrillDemo 


Interact (when online) with a Mathematica CDF 
demonstrating Pole/Zero Plots. To Download, right- 
click and save target as .cdf. 


Applications for pole-zero plots 


Stability and Control theory 


Now that we have found and plotted the poles and zeros, we must ask what 
it is that this plot gives us. Basically what we can gather from this is that the 
magnitude of the transfer function will be larger when it is closer to the 
poles and smaller when it is closer to the zeros. This provides us with a 
qualitative understanding of what the system does at various frequencies 
and is crucial to the discussion of stability. 


Pole/Zero Plots and the Region of Convergence 


The region of convergence (ROC) for X(z) in the complex Z-plane can be 
determined from the pole/zero plot. Although several regions of 
convergence may be possible, where each one corresponds to a different 
impulse response, there are some choices that are more practical. A ROC 
can be chosen to make the transfer function causal and/or stable depending 
on the pole/zero plot. 

Filter Properties from ROC 


e If the ROC extends outward from the outermost pole, then the system 
is causal. 
e If the ROC includes the unit circle, then the system is stable. 


Below is a pole/zero plot with a possible ROC of the Z-transform in the 
Simple Pole/Zero Plot discussed earlier. The shaded region indicates the 
ROC chosen for the filter. From this figure, we can see that the filter will be 
both causal and stable since the above listed conditions are both met. 


Example: 
pas Ss 
(z- 3) (2+ 3) 


Region of Convergence for the Pole/Zero Plot 


fal) 


The shaded area 
represents the chosen 
ROC for the transfer 

function. 


Frequency Response and Pole/Zero Plots 


The reason it is helpful to understand and create these pole/zero plots is due 
to their ability to help us easily design a filter. Based on the location of the 
poles and zeros, the magnitude response of the filter can be quickly 
understood. Also, by starting with the pole/zero plot, one can design a filter 
and obtain its transfer function very easily. 


Linear-Phase FIR Filters 


THE AMPLITUDE RESPONSE 


If the real and imaginary parts of H/(w) are given by 
Equation: 


HS (w) = R(w) + iF (w) 


the magnitude and phase are defined as 


so that 
Equation: 


HT (w) = |H4(w) |e” 


With this definition, 
write H(w) as 
Equation: 


H!(w)| is never negative and p(w) is usually discontinuous, but it can be very helpful to 


Hf (w) = A(w)e*™) 


where A(w) can be positive and negative, and @(w) continuous. A(w) is called the amplitude response. [link] 
illustrates the difference between |H f (w)| and A(w). 


|H(oo)| = | Aten) Phase{H(00)} 


A linear-phase phase filter is one for which the continuous phase 0(w) is linear. 


Hf (w) = A(w)e®™) 
with 
O(w) = Mw +B 


We assume in the following that the impulse response A(7n) is real-valued. 


WHY LINEAR-PHASE? 
If a discrete-time cosine signal 
xi(n) = cos(win+ 1) 
is processed through a discrete-time filter with frequency response 
Hf (w) = A(w)e®™) 
then the output signal is given by 
y(n) = A(wr) cos(win + 1 + O(w1)) 


or 


n(n) = Alwr) os(wi (n+ 2) 4 1) 


1 
The LTI system has the effect of scaling the cosine signal and delaying it by Se) 


Exercise: 


Problem: When does the system delay cosine signals with different frequencies by the same amount? 


Solution: 
° He) = constant 
° Ow) = Kw 


e The phase is linear. 


O(w) 


The function —— is called the phase delay. A linear phase filter therefore has constant phase delay. 


WHY LINEAR-PHASE: EXAMPLE 


Consider a discrete-time filter described by the difference equation 
Equation: 


y(n) = 0.18212(n) + 0.7865x(n — 1) — 0.6804x(n — 2) + x(n — 3) + 0.6804y(n — 1) — 0.7865y(n — 2) +( 


When w; = 0.317, then the delay is Math input error. The delay is illustrated in [link]: 


(n) (INPUT) 
48 


x 
1 


20 22 24 26 28 30 R 34 36 38 40 


= 
= & 


° 


(<n) Piel 


1 
a 


Notice that the delay is fractional --- the discrete-time samples are not exactly reproduced in the output. The 
fractional delay can be interpreted in this case as a delay of the underlying continuous-time cosine signal. 


WHY LINEAR-PHASE: EXAMPLE (2) 


Consider the same system given on the previous slide, but let us change the frequency of the cosine signal. 


When w2 = 0.477, then the delay is Math input error. 


= 
= 


(n) (INPUT) 
48 


%5 
I 


2 22 24 26 28 30 R 34 36 38 40 


} (OUTPUT) 
6.8 


y(n 


Note: For this example, the delay depends on the frequency, because this system does not have linear phase. 


WHY LINEAR-PHASE: MORE 


From the previous slides, we see that a filter will delay different frequency components of a signal by the same 
amount if the filter has linear phase (constant phase delay). 


In addition, when a narrow band signal (as in AM modulation) goes through a filter, the envelop will be delayed by 
the group delay or envelop delay of the filter. The amount by which the envelop is delayed is independent of the 
carrier frequency only if the filter has linear phase. 


Also, in applications like image processing, filters with non-linear phase can introduce artifacts that are visually 
annoying. 


Filter Structures 


A realizable filter must require only a finite number of computations per 
output sample. For linear, causal, time-Invariant filters, this restricts one to 
rational transfer functions of the form 


7 bok bie ha Oe 
eae! ggg? cae 


H(z) 
Assuming no pole-zero cancellations, H(z) is FIR if Vi,i > 0: (a; = 0), 
and IIR otherwise. Filter structures usually implement rational transfer 
functions as difference equations. 


Whether FIR or IIR, a given transfer function can be implemented with 
many different filter structures. With infinite-precision data, coefficients, 
and arithmetic, all filter structures implementing the same transfer function 
produce the same output. However, different filter strucures may produce 
very different errors with quantized data and finite-precision or fixed-point 
arithmetic. The computational expense and memory usage may also differ 
greatly. Knowledge of different filter structures allows DSP engineers to 
trade off these factors to create the best implementation. 


Overview of Digital Filter Design 
Advantages of FIR filters 


1. Straight forward conceptually and simple to implement 

2. Can be implemented with fast convolution 

3. Always stable 

4. Relatively insensitive to quantization 

5. Can have linear phase (same time delay of all frequencies) 


Advantages of IIR filters 


1. Better for approximating analog systems 

2. For a given magnitude response specification, IIR filters often require 
much less computation than an equivalent FIR, particularly for narrow 
transition bands 


Both FIR and IIR filters are very important in applications. 
Generic Filter Design Procedure 


1. Choose a desired response, based on application requirements 
2. Choose a filter class 

3. Choose a quality measure 

4. Solve for the filter in class 2 optimizing criterion in 3 


Perspective on FIR filtering 


Most of the time, people do L°® optimal design, using the Parks-McClellan 
algorithm, This is probably the second most important technique in 
"classical" signal processing (after the Cooley-Tukey (radix-2) FFT). 


Most of the time, FIR filters are designed to have linear phase. The most 
important advantage of FIR filters over IIR filters is that they can have 
exactly linear phase. There are advanced design techniques for minimum- 
phase filters, constrained L? optimal designs, etc. (see chapter 8 of text). 
However, if only the magnitude of the response is important, IIR filers 
usually require much fewer operations and are typically used, so the bulk of 
FIR filter design work has concentrated on linear phase designs. 


Window Design Method 


The truncate-and-delay design procedure is the simplest and most obvious FIR design procedure. 
Exercise: 


Problem: Is it any Good? 
Solution: 


Yes; in fact it's optimal! (in a certain sense) 


L2 optimization criterion 


find Vn,0 <n < M —1: (A[n]), maximizing the energy difference between the desired response and the actual 
response: i.e., find 


min {ln} f (1a(w) — #4(w))) aw} 


TT 


by Parseval's relationship 
Equation: 


min pin {ainl, J”. (\Ha(w) — H(w)|)? d w} 


I 


2 Ooo (|ealn] — hin)” 


= 2m (Soytce (lealn] — Afr)? + ONG! (Vhaln] — Afr)? + 
Since Math input error this becomes 
wT -1 M-— y lore) 
min jn] {ata | (|Ha(w) — H(w)|)? av} = SO (|halnjl)? + * +S (\haln] 
—T h=—oo 2, n=M 


Note: h[n] has no influence on the first and last sums. 


The best we can do is let 


nl lanes if0<n<M-1 


if else 


a -{' if 0 <n(M -1) 


if else 


is optimal in a least-total-sqaured-error ( 2, or energy) sense! 
Exercise: 


Problem: Why, then, is this design often considered undersirable? 


Solution: 


Gibbs Phenomenon 


A(w), small M A(w), large M 


_-~0.11 + 1.0 


™~_0.11 


For desired spectra with discontinuities, the least-square designs are poor in a minimax (worst-case, or D9) error 
sense. 


Window Design Method 


Apply a more gradual truncation to reduce "ringing" (Gibb's Phenomenon) 
Yn0<n<M-—t1h|n|=hg[n|w[n] : (n0<n<M-—1h[n|=ha|n|w{[n]) 


Note: H(w) = Ha(w)*W(w) 


The window design procedure (except for the boxcar window) is ad-hoc and not optimal in any usual sense. 
However, it is very simple, so it is sometimes used for "quick-and-dirty" designs of if the error criterion is itself 
heurisitic. 


Frequency Sampling Design Method for FIR filters 


Given a desired frequency response, the frequency sampling design method designs a filter 
with a frequency response exactly equal to the desired response at a particular set of 
frequencies wy. 
Equation: 

Procedure 


n=0 


M-1 
Vk, | [o, | eee .N = 1] : (21009 — S> Honjo) 


Note: Desired Response must incluce linear phase shift (if linear phase is desired) 


Exercise: 


Problem: What is H4(w) for an ideal lowpass filter, cotoff at w.? 


Solution: 


e ("5") if —We<w<u, 
0 if (-7<w< -w,) V (we <w<n7) 


Note: This set of linear equations can be written in matrix form 


Equation: 


Equation: 


Ha(wo) e (iw00) eo (wel) e- (iwo(M-1)) h(0) 
Ha(w1) e~ (iu10) eo (iwil) e- (wr (M-1)) h(1) 
Ha(wn-1) e (twu-10) — @—(twau-11) e~ (iwu-1(M—1)) h(M — 1) 
or 
Hy= Wh 
So 
Equation: 
h=W'Ha 


Note: W is a square matrix for N = M, and invertible as long as w; 4 w; + 2ml,i Fj 


Important Special Case 


What if the frequencies are equally spaced between 0 and 27, i.e. w, = 2ak +a 


Then 
= - Inkn . ee . - Inkn 
Ha(we) = S> A(nje eC) = Se (A(nje Ho Jel i) = DFT! 
n=0 n=0 
sO 
. 1 a - Iank 
h(n)e~ %”) iW 2 Ha(wz)e’ ™ 
or 
etan pte - 2ank . 
hin] = 5 S> Halwgle = ce" IDFT [Halwa] 
k=0 


Important Special Case #2 


h|n] symmetric, linear phase, and has real coefficients. Since h[n] = h|M — nl, there are 
only x degrees of freedom, and only a linear equations are required. 
Equation: 


Hwy] = cet h[nje~ sr) 
z hin] (e~ (iwxn) 4 e—(ur(M—n—1))) if M even 
= re hin] (e~ (iwen) 4 © SE) (n[ Je — (iw )) if M odd 


— (iene Not, " 3 h[n] cos(w;, (44+ — n)) if Meven 


e~ (wn ayia h[n| cos(w;, (“44+ —n)) + h[4+*] if M odd 
Removing linear phase from both sides yields 


2S na h(n] cos(w, (44+ —n)) if Meven 
Dak ra h[n| cos(w, (4 —n)) + A[ 44] if Modd 


A(wr) = 


Due to symmetry of response for real coefficients, only 4 w on w € [0, 77) need be 
specified, with the frequencies —w, thereby being implicitly defined also. Thus we have 


x real-valued simultaneous linear equations to solve for h|n]. 


Special Case 2a 


h|n] symmetric, odd length, linear phase, real coefficients, and w; equally spaced: 
Vk,O<k<M-1: (w= @) 
Equation: 

h{n] = IDFT [Ha(wrx)| 


= Dy A(wee Or) Mo eit 


= Ay) Ake Gr (ms) 


To yield real coefficients, A(w) mus be symmetric 
(A(w) = A(—w)) = (AlA] = A[M — k)) 


Equation: 


ial ( A(0) +35,%, AlK] (cit 4) “i eee) )) 
(A(0) +25,.% AIR] cos(2## (n — 454))) 


(A(0) +20,%) AlK](-1)* cos(2## (n + 4))) 


| 
a oe 


Simlar equations exist for even lengths, anti-symmetric, and a = - filter forms. 


Comments on frequency-sampled design 


This method is simple conceptually and very efficient for equally spaced samples, since 
h|n] can be computed using the IDFT. 


H(w) for a frequency sampled design goes exactly through the sample points, but it may 
be very far off from the desired response for w ~ wy. This is the main problem with 
frequency sampled design. 


Possible solution to this problem: specify more frequency samples than degrees of 
freedom, and minimize the total error in the frequency response at all of these samples. 


Extended frequency sample design 


For the samples H(w,) where0 <k < M-—1andWN > M, find h|n], where 
0<n< M —1 minimizing || Hy(w,) — H(w,) || 


For || / ||, norm, this becomes a linear programming problem (standard packages 
availble!) 


Here we will consider the || / ||, norm. 


To minimize the || 1 ||, norm; that is, )>”)' |Ha(we) — H(wx)|, we have an 
overdetermined set of linear equations: 


eo (iwo) =p (iwo( M1) Ha(wo) 
Ha(w1) 
. a; * h mn - 
—(iwy_10) —(iwn—1(M—1)) j 
e wee, E Ha(wn-1) 


or 


Wh= Ha 


— = Lis 
The minimum error norm solution is well known to be h = (ww) W Ag; 


_—_ = 
(ww) W is well known as the pseudo-inverse matrix. 


Note:Extended frequency sampled design discourages radical behavior of the frequency 
response between samples for sufficiently closely spaced samples. However, the actual 
frequency response may no longer pass exactly through any of the Hy(w,). 


Parks-McClellan FIR Filter Design 


The approximation tolerances for a filter are very often given in terms of the maximum, or 
worst-case, deviation within frequency bands. For example, we might wish a lowpass filter in a 
(16-bit) CD player to have no more than 5 bit deviation in the pass and stop bands. 


t= 2 (Fig) Hie FF lel ea 
Ha) =| ar = | (w)|= 517 lw] < wy 


sir > |H(w)| if ws < |w| <a 


The Parks-McClellan filter design method efficiently designs linear-phase FIR filters that are 
optimal in terms of worst-case (minimax) error. Typically, we would like to have the shortest- 
length filter achieving these specifications. Figure [link] illustrates the amplitude frequency 
response of such a filter. 


The black boxes on the left and right are the passbands, the 

black boxes in the middle represent the stop band, and the 

space between the boxes are the transition bands. Note that 
overshoots may be allowed in the transition bands. 


Exercise: 


Problem: Must there be a transition band? 


Solution: 


Yes, when the desired response is discontinuous. Since the frequency response of a finite- 
length filter must be continuous, without a transition band the worst-case error could be no 
less than half the discontinuity. 


Formal Statement of the L-o»o (Minimax) Design Problem 


For a given filter length (/) and type (odd length, symmetric, linear phase, for example), and a 
relative error weighting function W(w), find the filter coefficients minimizing the maximum 
error 


argminargmax |/(w)| =argmin || E(w) || ,, 
h weF h 


where 
E(w) = W(w) (Haw) — H()) 


and F' is a compact subset of w € [0, 7] (i.e., all w in the passbands and stop bands). 


Note: Typically, we would often rather specify || E(w) ||,, < 6 and minimize over M and h; 
however, the design techniques minimize 6 for a given M. One then repeats the design 
procedure for different M until the minimum MM satisfying the requirements is found. 


We will discuss in detail the design only of odd-length symmetric linear-phase FIR filters. 
Even-length and anti-symmetric linear phase FIR filters are essentially the same except for a 
slightly different implicit weighting function. For arbitrary phase, exactly optimal design 
procedures have only recently been developed (1990). 


Outline of L-co Filter Design 


The Parks-McClellan method adopts an indirect method for finding the minimax-optimal filter 
coefficients. 


1. Using results from Approximation Theory, simple conditions for determining whether a 
given filter is 1° (minimax) optimal are found. 

2. An iterative method for finding a filter which satisfies these conditions (and which is thus 
optimal) is developed. 


That is, the L filter design problem is actually solved indirectly. 


Conditions for L-0»o Optimality of a Linear-phase FIR Filter 


All conditions are based on Chebyshev's "Alternation Theorem," a mathematical fact from 
polynomial approximation theory. 


Alternation Theorem 


Let F' be a compact subset on the real axis x, and let P(x) be and Lth-order polynomial 


L 
P(e) = [> apa* 
k=0 


Also, let D(x) be a desired function of x that is continuous on F’, and W(z) a positive, 
continuous weighting function on F’. Define the error E(x) on F' as 


and 


|| E(x) ||, =argmax |E(z)| 
2eFk 


A necessary and sufficient condition that P(a) is the unique Lth-order polynomial minimizing 
|| E(x) ||,, is that E(x) exhibits at least L + 2 "alternations;" that is, there must exist at least 
D+ 2 values of x, 2, € F,k = [0,1,..., 4-1], such that ro < 2] <... < @p42 and such 
that (etx) = —E(ans1) = +(\| E len) 

Exercise: 


Problem: What does this have to do with linear-phase filter design? 
Solution: 


It's the same problem! To show that, consider an odd-length, symmetric linear phase filter. 
Equation: 


Hw) = SMG hnje 


= ew) (n(45*) LOS Ss n( —n) cos(wn) 
Equation: 
A(w) = hA(L) +2 De h(L — n) cos(wn) 


Where L = “=!, 


Using trigonometric identities (such as 
cos(na) = 2cos((n — 1)a) cos(a) — cos((n — 2)a)), we can rewrite A(w) as 


A(w) = hA(L) 4+ 2 >», h(L — n) cos(wn) ay a, cos (w) 


where the a, are related to the h(n) by a linear transformation. Now, let 2 = cos(w). This 
is a one-to-one mapping from z € [—1, 1] onto w € [0, z]. Thus A(w) is an Lth-order 
polynomial in z = cos(w)! 


Note:The alternation theorem holds for the °° filter design problem, too! 


Therefore, to determine whether or not a length-/, odd-length, symmetric linear-phase 
filter is optimal in an LD sense, simply count the alternations in 

E(w) = W(w) (Aa(w) — A(w)) in the pass and stop bands. If there are L + 2 = “43 or 
more alternations, h(n), 0 <n < M — 1 is the optimal filter! 


Optimality Conditions for Even-length Symmetric Linear-phase Filters 


For M even, 
L 
A(w) = A(L -_ 
(w) , ( )oos(w (n+ 5)) 
where L = _ — 1 Using the trigonometric identity 


cos(a + 8) = cos(a@ — 8) + 2cos(a) cos(f) to pull out the $ term and then using the other 
trig identities, it can be shown that A(w) can be written as 


A(w) = cos( = ) Ss arcost w) 


Again, this is a polynomial in « = cos(w), except for a weighting function out in front. 
Equation: 


E(w) = W(w) (Aa(w) — Al) 
= W(w) (Aa(w) — cos(#) P(w)) 


W(w) cos(#) (Ase - P(w)) 


vole 


which implies 
Equation: 


where 


and 


eal cos( +(cos(z))*) 


Again, this is a polynomial approximation problem, so the alternation theorem holds. If E(w) 
has at least D + 2 = ae + 1 alternations, the even-length symmetric filter is optimal in an L™ 
sense. 


The prototypical filter design problem: 
1 if jw) <w, 
MS | & if feel < lal 


See [link]. 


L-oo Optimal Lowpass Filter Design Lemma 


1. The maximum possible number of alternations for a lowpass filter is L + 3: The proof is 
that the extrema of a polynomial occur only where the derivative is zero: ae = 0. Since 
P'(a) is an (LZ — 1)th-order polynomial, it can have at most ZL — 1 zeros. However, the 
mapping x = cos(w) implies that oA) = 0 at w = 0 and w = 7, for two more possible 
alternation points. Finally, the band edges can also be alternations, for a total of 
DZ-1+2+2=L +3 possible alternations. 

2. There must be an alternation at either w = 0orw = 7. 

3. Alternations must occur at w, and ws. See [link]. 


4. The filter must be equiripple except at possibly w = 0 or w = 7. Again see [link]. 


Note:The alternation theorem doesn't directly suggest a method for computing the optimal 
filter. It simply tells us how to recognize that a filter is optimal, or isn't optimal. What we need 
is an intelligent way of guessing the optimal filter coefficients. 


In matrix form, these LZ + 2 simultaneous equations become 


1 cos(wo) cos(2wy) ... — cos(Lw) Wa} A 
. 220) sey a. ae h(L) a(wo) 
nl) | 
h(0) 
ee ° Anton 
1 cos(wz41) cos(2wz41) ... cos(Lwy+1) W(wisi) 
or 


v(i)-4 


So, for the given set of Z + 2 extremal frequencies, we can solve for h and 6 via 

(hd)? = W-1A,. Using the FFT, we can compute A(w) of h(n), on a dense set of 
frequencies. If the old w, are, in fact the extremal locations of A(w), then the alternation 
theorem is satisfied and h(n) is optimal. If not, repeat the process with the new extremal 
locations. 


Computational Cost 


O(L’) for the matrix inverse and N log, N for the FFT (NV > 32L, typically), per iteration! 
This method is expensive computationally due to the matrix inverse. 
A more efficient variation of this method was developed by Parks and McClellan (1972), and is 


based on the Remez exchange algorithm. To understand the Remez exchange algorithm, we 
first need to understand Lagrange Interpoloation. 


Now A(w) is an Lth-order polynomial in z = cos(w), so Lagrange interpolation can be used to 
exactly compute A(w) from ZL + 1 samples of A(w,), & = (0, 1, 2,..., L]. 


Thus, given a set of extremal frequencies and knowing 6, samples of the amplitude response 
A(w) can be computed directly from the 
Equation: 


without solving for the filter coefficients! 
This leads to computational savings! 


Note that [link] is a set of Z + 2 simultaneous equations, which can be solved for 6 to obtain 
(Rabiner, 1975) 


Equation: 
5- ico VkAa(wr) 
E+1 (1) x 
where 
7 L+1 1 
oo sith cos(w%,) — cos(w;) 


The result is the Parks-McClellan FIR filter design method, which is simply an application of 
the Remez exchange algorithm to the filter design problem. See [link]. 


Tnitial guess of L+2 
extremal fequencies 


Compute 6 using 


the equation given 


Using Lagrange interpolation 
compute dense set of samples 
(typically, 16*L) of Af) over the 
pass and stop bands 


Determine new L+2 
largest extrema 


es 
Compute h(n) 


Alternation Theorem satisfied? 


The initial guess of extremal frequencies is 
usually equally spaced in the band. 
Computing 6 costs O(L’). Using Lagrange 
interpolation costs O(16LL) ~ O(16L7). 
Computing h(n) costs O(L*), but it is only 


done once! 


The cost per iteration is O(16L7) , aS Opposed to O(L*); much more efficient for large L. Can 
also interpolate to DFT sample frequencies, take inverse FFT to get corresponding filter 
coefficients, and zeropad and take longer FFT to efficiently interpolate. 


FIR Filter Design using MATLAB 
FIR Filter Design Using MATLAB 


Design by windowing 


The MATLAB function firi( ) designs conventional lowpass, highpass, 
bandpass, and bandstop linear-phase FIR filters based on the windowing 
method. The command 


b = firi(N,Wn) 


returns in vector b the impulse response of a lowpass filter of order N. The 
cut-off frequency Wn must be between 0 and 1 with 1 corresponding to the 
half sampling rate. 


The command 


b = fir1(N,Wn, 'high') 


returns the impulse response of a highpass filter of order N with normalized 
cutoff frequency Wn. 


Similarly,b = firi(N,Wn, 'stop' ) with Wn a two-element vector 
designating the stopband designs a bandstop filter. 


Without explicit specification, the Hamming window is employed in the 
design. Other windowing functions can be used by specifying the 
windowing function as an extra argument of the function. For example, 


Blackman window can be used instead by the command b = fir1i(N, 
Wn, blackman(N) ). 


Parks-McClellan FIR filter design 


The MATLAB command 


b = remez(N,F,A) 


returns the impulse response of the length N+1 linear phase FIR filter of 
order N designed by Parks-McClellan algorithm. F is a vector of frequency 
band edges in ascending order between 0 and 1 with 1 corresponding to the 
half sampling rate. A is a real vector of the same size as F which specifies 
the desired amplitude of the frequency response of the points 
(F(k),A(k)) and (F(k+1),A(k+1) ) for odd k. For odd k, the bands 
between F(k+1) and F(k+2) is considered as transition bands. 


MATLAB FIR Filter Design Exercise 
FIR Filter Design MATLAB Exercise 


Design by windowing 


Exercise: 


Problem: 
Assuming sampling rate at 48kHz, design an order-40 low-pass filter 
having cut-off frequency 10kHz by windowing method. In your 


design, use Hamming window as the windowing function. 


Solution: 


b = fir1(40,10.0/48.0) 


Parks-McClellan Optimal Design 


Exercise: 


Problem: 

Assuming sampling rate at 48kHz, design an order-40 lowpass filter 
having transition band 10kHz-11kHz using the Parks-McClellan 
optimal FIR filter design algorithm. 


Solution: 


b = remez(40,[1 1 0 O],[0 10/48 
11/48 1]) 


Upsampling 


Upsampling 
The operation of upsampling by factor L © N describes the insertion of 


L — 1 zeros between every sample of the input signal. This is denoted by " 
7 (Z)" in block diagrams, as in [link]. 


xm] | th - yln] 


Formally, upsampling can be expressed in the time domain as 
x\>| if eZ 
yln] = { ln] if 
0 otherwise 


In the z-domain, 
Y¥{z)= So ylnlz = » fe |e = So a[k]z% = X(2") 


and substituting z = e™ for the DTFT, 
Equation: 


Y (e”) = X(e”) 


As shown in [link], upsampling compresses the DTFT by a factor of L 
along with the w axis. 


Downsampling 
(Blank Abstract) 


The operation of downsampling by factor / € N describes the process of 


keeping every M*” sample and discarding the rest. This is denoted by " 
{ (M)" in block diagrams, as in [link]. 


x{m|] {M |— yjn] 


Formally, downsampling can be written as 
y|n] = 2|nM] 


In the z domain, 
Equation: 


Y(z) = 


mlm] (fp DM! ett) 237 


a 1 if mis a multiple of M 
where 5- yer eiwpm — ‘ 
P 0 otherwise 


Equation: 
¥(2) = UME elm] (ee) zt)” 
= 4 DI X(e i) zi) 


Translating to the frequency domain, 
Equation: 


As shown in [link], downsampling expands each 27 -periodic repetition of 
xX (e”) by a factor of M along the w axis, and reduces the gain by a factor 
of M. If x|m] is not bandlimited to 5, aliasing may result from spectral 
overlap. 


Note: When performing a frequency-domain analysis of systems with 
up/downsamplers, it is strongly recommended to carry out the analysis in 
the z-domain until the last step, as done above. Working directly in the e’”- 
domain can easily lead to errors. 


x{m] X(e”) 


~ WwW 


Interpolation 


Interpolation 


Interpolation is the process of upsampling and filtering a signal to increase 
its effective sampling rate. To be more specific, say that z[m] is an 
(unaliased) T-sampled version of x,(t) and v[n] is an L-upsampled version 
version of x|m]. If we filter v[m] with an ideal +-bandwidth lowpass filter 
(with DC gain L) to obtain y[n], then y[n] will be a +-sampled version of 


x-(t). This process is illustrated in [link]. 


We justify our claims about interpolation using frequency-domain 
arguments. From the sampling theorem, we know that T- sampling x,(t) to 
create z[n] yields 

Equation: 


53 1 Ww — Ik 
X(e ) = FLX) 


After upsampling by factor Z, [link] implies 


won FE) HE 


Lowpass filtering with cutoff + and gain L yields 


ESE) tele) 


Y ta 


since the spectral copies with indices other than k = [LZ (forl € Z) are 
removed. Clearly, this process yields a + -shaped version of x(t). [link] 
illustrates these frequency-domain arguments for L = 2. 


DC gain 2 
Fa v[n] | 


z{m] —" 2 ~ LPF = |\——_+ y/[n] 


|X(e")| |V(e™*)| EM) 


H l ‘ ; ee ' 2 

} ~\ } ‘a ] ~\ | f~ "| (*) an (*) ft | { 1 3 ( 

' ' } ' ' ' ' ' | ' ' ry 
WLW EE ELE EE 


Q Qs 037 2n QO ® 2x 


Application of Interpolation - Oversampling in CD Players 


Application of Interpolation- Oversampling in CD Players 


The digital audio signal on a CD is a 44.1 kHz sampled representation of a 
continuous signal with bandwidth 20 kHz. With a standard ZOH-DAC, the 
analog reconstruction filter would have passband edge at 20 kHz and 
stopband edge at 24.1 kHz. (See [link]) With such a narrow transition band, 
this would be a difficult (and expensive) filter to build. 


|\X(e™)| |X2(j22)| |H-(j2)| 
: : 


| a TF \24,1k 
! | \v. 2 \f 2 
" Qn” : 


7 | 
0 


Sd ad - 2 
0 20k 44.1k 0 2k 441k 


If digital interpolation is used prior to reconstruction, the effective sampling 
rate can be increased and the reconstruction filter's transition band can be 
made much wider, resulting in a much simpler (and cheaper) analog filter. 
[link] illustrates the case of interpolation by 4. The reconstruction filter has 
passband edge at 20 kHz and stopband edge at 156.4 kHz, resulting in a 
much wider transition band and therefore an easier filter design. 


|\X(e*)| |X, (j22)| |H,(j2)| 
* 6 ‘ 
| | L—, 
20k 156.4k | \ '156.4k 
f leet om 20k “\\__ \ ™ 
-wW : a Ox / In 


0. 7 Qn 0 i76.4k 0 176.4k 


Decimation 


Decimation is the process of filtering and downsampling a signal to 
decrease its effective sampling rate, as illustrated in [link]. The filtering is 
employed to prevent aliasing that might otherwise result from 
downsampling. 

DC gain 1 


z{m] —=LPF fj vm “4M — yln] 


To be more specific, say that 
tit t) = x1(t) + ei (t} 


where ,(t) is a lowpass component bandlimited to 577 ur Hz and x(t) isa 
bandpass component with energy between 5 sur a nd = + Hz. If sampling 
z(t) with interval T yields an unaliased discrete representation z{m], then 


decimating x|m] by a factor M will yield y[n], an unaliased MT-sampled 
representation of lowpass component 2/(t). 


We offer the following justification of the previously described decimation 
procedure. From the sampling theorem, we have 


— 2rk 1 w — 27k 
x = 4S 
el re he a 
The bandpass component Xy(7£2) is the removed by +7-lowpass filtering, 


giving 
1 w— 27k 
2 Vigo 
T 2 (i T ) 


Finally, downsampling yields 
Equation: 


¥(e) = bp Spl! D(A | 


_ aT a oo: Xi(i ss) 
_ a Xi (i van | 


which is clearly a MT-sampled version of x;(t). A frequency-domain 
illustration for M = 2 appears in [link]. 


DC gain 1 
vim] 
{mn —-| LPF § | of y2 Lf 
» ee IV (e%)| 
i 1] 


m) —— ~~ 
A “ 


Resampling with Rational Factor 


Interpolation by LZ and decimation by M can be combined to change the 
effective sampling rate of a signal by the rational factor va . This process is 
called resampling or sample-rate conversion. Rather than cascading an 
anti-imaging filter for interpolation with an anti-aliasing filter for 
decimation, we implement one filter with the minimum of the two cutoffs 
Ee | and the multiplication of the two DC gains (L and 1), as 


illustrated in [link]. 


DC gain L 


tL LPF min{ +, 77} 


Digital Filter Design for Interpolation and Decimation 


First we treat filter design for interpolation. Consider an input signal x|n| 
that is wo-bandlimited in the DTFT domain. If we upsample by factor L to 
get v(m], the desired portion of V (e™”) is the spectrum in i= 7 ) , while 
the undesired portion is the remainder of |—7, 77). Noting from [link] that 
Vie™) has zero energy in the regions 

Equation: 


2kr+wo 2(k+1)r — wo 


, KE 
L L 


the anti-imaging filter can be designed with transition bands in these 
regions (rather than passbands or stopbands). For a given number of taps, 
the additional degrees of freedom offered by these transition bands allows 
for better responses in the passbands and stopbands. The resulting filter 
design specifications are shown in the bottom subplot below. 

|X (e*)| 


lV (e"*)| 
i 


z * , as * dz i iS ae 
0 wy / L 2a un i 2z+un my dz-uy P 4x+uo a 6x-u 
, . L I E 
|H (e*)| 
1 : 
} | 
/ 1 | 
' ' 
Bu: iid or 
0 z a , az * dz fs 
“ I a= 0, 2 tun L fz-uy “, 4e+u0 i ae a 
I L I 


Next we treat filter design for decimation. Say that the desired spectral 
component of the input signal is bandlimited to 47 < 7; and we have 


decided to downsample by M. The goal is to minimally distort the input 
spectrum over as ae), i.e., the post-decimation spectrum over 


[—wy, wo). Thus, we must not allow any aliased signals to enter [—wy, wo). 


To allow for extra degrees of freedom in the filter design, we do allow 
aliasing to enter the post-decimation spectrum outside of |—w, wo) within 
|—7, 7). Since the input spectral regions which alias outside of [—wo, wo) 
are given by 

Equation: 


2kr+ wy 2(k+1)r — wo 


’ KE 
L L 


(as shown in [link]), we can treat these regions as transition bands in the 
filter design. The resulting filter design specifications are illustrated in the 
middle subplot ([link]). 


|X (e*)| 
0 we / F \ one % Qxtuy yy . da wo 1 dato 4 UV ss rea 
lH (e*)) W M VW M : Ww 
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a 
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\ j | , | 
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ly(e*)| 


Noble Identities 


The Noble identities (illustrated in [link] and [link]) describe when it is 
possible to reverse the order of upsampling/downsampling and filtering. We 
prove the Noble identities showing the equivalence of each pair of block 
diagrams. 


The Noble identity for interpolation can be depicted as in [link]: 


fn] ef ge Ly ret) Le vm) xf] ef arte) LL Le iy 


For the left side of the diagram, we have 
Y= H(z”)Vi(z) 


where V;(z) = X(z*) 
while for the right side, 
where V2(z) = H(z)X(z) 
Y(z)= H(z”) X(z’) 
Thus we have established the Noble identity for interpolation. 


The Noble identity for decimation can be depicted as in [link]: 


v(m] | - wv.) vein - 
z{n] —+|M on H(z) —~y[m] <> 2[n] —~ A(2™)}—+ | M -~ y|m] 


For the left side of the preceding diagram, we have 


es : 


Vi(z) = ‘Vi X eirk, 
k 0 


Rr 


Equation: 
Y(z) = A(z)Vi(z) 


veo X eC )ikzw 


while for the right side, 


Equation: 
Y(z) 1 a (—1) 35k ar 
z)=— ye z 
M x9 
where V2(z) = X(z)H(z™) 
Equation: 
YQ) = tye x e(itkza H el arkM gir 


H(z); ae, 4 e(-t) ark gaz 


Thus we have established the Noble identity for decimation. Note that the 
impulse response of H (Z* ) is the L-upsampled impulse response of H(z). 


Polyphase Interpolation 


Polyphase Interpolation Filter 


Recall the standard interpolation procedure illustrated in [link]. 


ele] a] th |—+) (2) |—= ol 


Note that this procedure is computationally inefficient because the lowpass 
filter operates on a sequence that is mostly composed of zeros. Through the 
use of the Noble identities, it is possible to rearrange the preceding block 
diagram so that operations on zero-valued samples are avoided. 


In order to apply the Noble identity for interpolation, we must transform 
H(z) into its upsampled polyphase components H,, as \; 


DP =40j2.4406— 1}, 
Equation: 
H(z) = dinn hin i - 
Dk ye  A[kL + plz?) 


via k = Rane = nmod L 
Equation: 


via hp|k] = h[kL + pl 
Equation: 


Above, |-| denotes the floor operator and - mod M the modulo-M 
operator. Note that the p*® polyphase filter h,[k] is constructed by 
downsampling the "master filter" h[n] at offset p. Using the unsampled 
polyphase components, the [link] diagram can be redrawn as in [link]. 


an] — th | H(z") +) yl] 
zl 
a 
~ E(z") A+) 
z7 
zo 
~ Hy_1(z") 


Applying the Noble identity for interpolation to [link] yields [link]. The 
ladder of upsamplers and delays on the right below accomplishes a form of 
parallel-to-serial conversion. 


ae} fe 


Polyphase Decimation Filter 
Implementation of polyphase decimation filters. 


Polyphase Decimation 


Recall the standard decimation method in [link]. 


z{n] —- H(z 4M |— yf y(m|] 


Note that this procedure is computationally inefficient because it discards 
the majority of the computed filter outputs. Through the use of the Noble 
identities, it is possible to rearrange [link] so that filter outputs are not 
discarded. 


In order to apply the Noble identity for decimation, we must transform 
H(z) into its upsampled polyphase components H, Cag ), 

p = {0,..., M — 1}, defined previously in the context of polyphase 
interpolation. 

Equation: 


H(2) = Lon hln|e” 
Tae GT ARM + pe)? 


viak = lar |> P = n mod M 
Equation: 


via h,[k] = h[kM + p| 


Equation: 


Using these unsampled polyphase components, the preceding block 
diagram can be redrawn as [link]. 


Paes 


z{n} | Hy(z™) | as -) JM = y{m] 
z-! / 
| Ay (z™) | 
zy! 
y-! 
Hy-i(z™) | 


Applying the Noble identity for decimation to [link] yields [link]. The 
ladder of delays and downsamplers on the left below accomplishes a form 
of serial-to-parallel conversion. 


Computational Savings of Polyphase Interpolation/Decimation 


Computational Savings of Polyphase Interpolation/Decimation 


Assume that we design FIR LPF H(z) with N taps, requiring N multiplies 
per output. For standard decimation by factor M/, we have N multiplies per 
intermediate sample and M intermediate samples per output, giving NM 
multiplies per output. 


For polyphase decimation, we have a multiplies per branch and MW 
branches, giving a total of NV multiplies per output. The assumption of a“ 
multiplies per branch follows from the fact that h[n| is downsampled by 
to create each polyphase filter. Thus, we conclude that the standard 
implementation requires / times as many operations as its polyphase 
counterpart. (For decimation, we count multiples per output, rather than per 
input, to avoid confusion, since only every M* input produces an output.) 


From this result, it appears that the number of multiplications required by 
polyphase decimation is independent of the decimation rate Z. However, it 
should be remembered that the length NV of the =; -lowpass FIR filter H(z) 
will typically be proportional to /. This is suggested, e.g., by the Kaiser 
FIR-length approximation formula 


—10 log (6,6,) — 13 
2.3240 (w) 


where A(w) in the transition bandwidth in radians, and 6, and 6, are the 
passband and stopband ripple levels. Recall that, to preserve a fixed signal 
bandwidth, the transition bandwidth A(w) will be linearly proportional to 
the cutoff =, so that NV will be linearly proportional to M. In summary, 
polyphase decimation by factor M requires NV multiplies per output, where 
N is the filter length, and where JN is linearly proportional to M. 


Using similar arguments for polyphase interpolation, we could find 
essentially the same result. Polyphase interpolation by factor L requires N 
multiplies per input, where JV is the filter length, and where JV is linearly 


proportional to the interpolation factor L. (For interpolation we count 
multiplies per input, rather than per output, to avoid confusion, since M 
outputs are generated in parallel.) 


Sub-Band Processing 
Why Filterbanks? 


Sub-band Processing 


There exist many applications in modern signal processing where it is 
advantageous to separate a signal into different frequency ranges called 
sub-bands. The spectrum might be partitioned in the uniform manner 
illustrated in [link], where the sub-band width Aj; ar is identical for 


each sub-band and the band centers are uniformly spaced at intervals of +7. 


Alternatively, the sub-bands might have a logarithmic spacing like that 
shown in [link]. 


_—, 
ti 
x 6m: a a 
& 4 2 


For most of our discussion, we will focus on uniformly spaced sub-bands. 


The separation into sub-band components is intended to make further 
processing more convenient. Some of the most popular applications for sub- 


band decomposition are audio and video source coding (with the goal of 
efficient storage and/or transmission). 


[link] illustrates the use of sub-band processing in MPEG audio coding. 
There a psychoacoustic model is used to decide how much quantization 
error can be tolerated in each sub-band while remaining below the hearing 
threshold of a human listener. In the sub-bands that can tolerate more error, 
less bits are used for coding. The quantized sub-band signals can then be 
decoded and recombined to reconstruct (an approximate version of) the 
input signal. Such processing allows, on average, a 12-to-1 reduction in bit 
rate while still maintaining "CD quality" audio. The psychoacoustic model 
takes into account the spectral masking phenomenon of the human ear, 
which says that high energy in one spectral region will limit the ear's ability 
to hear details in nearby spectral regions. Therefore, when the energy in one 
sub-band is high, nearby sub-bands can be coded with less bits without 
degrading the perceived quality of the audio signal. The MPEG standard 
specifies 32-channels of sub-band filtering. Some psychoacoustic models 
also take into account "temporal masking" properties of the human ear, 
which say that a loud burst of sound will temporarily overload the ear for 
short time durations, making it possible to hide quantization noise in the 
time interval after a loud sound burst. 


In typical applications, non-trivial signal processing takes place between the 
bank of analysis filters and the bank of synthesis filters, as shown in [link]. 
We will focus, however, on filterbank design rather than on the processing 
that occurs between the filterbanks. 


a{n) 


Our goals in filter design are: 


1. Good sub-band frequency separation (i.e., good "frequency 
selectivity"). 

2. Good reconstruction (i.e., yn «2n_ d forsome integer delay d) 
when the sub-band processing is lossless. 


The first goal is driven by the assumption that the sub-band processing 
works best when it is given access to cleanly separated sub-band signals, 
while the second goal is motivated by the idea that the sub-band filtering 
should not limit the reconstruction performance when the sub-band 
processing (e.g., the coding/decoding) is lossless or nearly lossless. 


Discrete Wavelet Transform: Main Concepts 


Main Concepts 


The discrete wavelet transform (DWT) is a representation of a signal 

x(t) € Y» using an orthonormal basis consisting of a countably-infinite set 
of wavelets. Denoting the wavelet basis as {Wen(t)|kKE A ne },the 
DWT transform pair is 

Equation: 


x(t) = dinWkn (t) 


k -—oo tm —cO 
Equation: 


dim = (Wen(t), e(t)) 
= me Wrn(t)z(t) dt 


where {dj,,,} are the wavelet coefficients. Note the relationship to Fourier 
series and to the sampling theorem: in both cases we can perfectly describe 
a continuous-time signal x(t) using a countably-infinite (i.e., discrete) set 
of coefficients. Specifically, Fourier series enabled us to describe periodic 
signals using Fourier coefficients {X[k]|k € }, while the sampling 
theorem enabled us to describe bandlimited signals using signal samples 
{z[n]|n €  }. In both cases, signals within a limited class are represented 
using a coefficient set with a single countable index. The DWT can describe 
any signal in “5 using a coefficient set parameterized by two countable 
indices: {din|kKE Ane }. 


Wavelets are orthonormal functions in obtained by shifting and 
stretching a mother wavelet, ~(t) € %). For example, 
Equation: 


Vk,nk An€ : en(t)=2° 7p 2*t—n 


defines a family of wavelets {~i.n(t)|kK€ A ne€_ } related by power- 
of-two stretches. As k increases, the wavelet stretches by a factor of two; as 
nm increases, the wavelet shifts right. 


Note: When || 7)(t) ||= 1, the normalization ensures that || Hzn(t) ||= 1 
forallke ,ne 


Power-of-two stretching is a convenient, though somewhat arbitrary, 
choice. In our treatment of the discrete wavelet transform, however, we will 
focus on this choice. Even with power-of two stretches, there are various 
possibilities for 7)(t), each giving a different flavor of DWT. 


Wavelets are constructed so that {#,n(t)|n € } (ie., the set of all shifted 
wavelets at fixed scale k), describes a particular level of 'detail' in the 
signal. As k becomes smaller (i.e., closer to —oo), the wavelets become 
more "fine grained" and the level of detail increases. In this way, the DWT 
can give a multi-resolution description of a signal, very useful in analyzing 
"real-world" signals. Essentially, the DWT gives us a discrete multi- 
resolution description of a continuous-time signal in 7%. 


In the modules that follow, these DWT concepts will be developed "from 
scratch" using Hilbert space principles. To aid the development, we make 
use of the so-called scaling function y(t) € 2, which will be used to 
approximate the signal up to a particular level of detail. Like with 
wavelets, a family of scaling functions can be constructed via shifts and 
power-of-two stretches 

Equation: 


VE, nk ANE : Pen(t) = 2- ty 7a eae 


given mother scaling function y(t). The relationships between wavelets and 
scaling functions will be elaborated upon later via theory and example. 


Note: The inner-product expression for dz, [link] is written for the 
general complex-valued case. In our treatment of the discrete wavelet 
transform, however, we will assume real-valued signals and wavelets. For 


this reason, we omit the complex conjugations in the remainder of our 
DWT discussions 


The Haar System as an Example of DWT 


The Haar basis is perhaps the simplest example of a DWT basis, and we 
will frequently refer to it in our DWT development. Keep in mind, however, 
that the Haar basis is only an example; there are many other ways of 
constructing a DWT decomposition. 


For the Haar case, the mother scaling function is defined by [link] and 
[link]. 
Equation: 


o(t) 
1] 
—= tf 


0 1 


From the mother scaling function, we define a family of shifted and 
stretched scaling functions according to [link] and [link] 
Equation: 


ee 


g-k/2 
F 
0 V n2*k (n+1)2* 


which are illustrated in [link] for various and _. [link] makes clear the 
principle that incrementing by one shifts the pulse one place to the right. 
Observe from [link] that is orthonormal foreach  (i.e., 
along each row of figures). 


-1,o(t) 
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Filterbanks Interpretation of the Discrete Wavelet Transform 


Assume that we start with a signal x(t) € . Denote the best approximation at the 0* level of coarseness by 
x(t). (Recall that z(t) is the orthogonal projection of x(t) onto Vo .) Our goal, for the moment, is to decompose 
Z(t) into scaling coefficients and wavelet coefficients at higher levels. Since zo(t) € Vo and Vo = Vi © Wi, 
there exist coefficients {cg[n]}, {ci[n]}, and {d1[n]} such that 

Equation: 


to(t) = Yan col] ¥on(é) 
Man Clr] ¢inlt] + Van Alri nlt] 


I 


Using the fact that {y,,,,(#)| m € Z} is an orthonormal basis for Vj , in conjunction with the scaling equation, 
Equation: 


ci[n| 


| 


to(t), Yin(t)) 

Da aut 

] ((Yo,m 

] (e(t — m), ye h[dle(t — £— 2n))) 

m ©O cl Lew hid] (elt — m), e(t — £ — 2n))) 
| = 


w(t); Prn(t)) 
(t), Pin(t))) 


I 
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= Doan 
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a 
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where d[t — £— 2n] = (y(t — m), p(t — £ — 2n)). The previous expression ([link]) indicates that {c,[n]} results 
from convolving {co|m]} with a time-reversed version of h|m] then downsampling by factor two ({link]). 


co{m] —- G(z-*) ~ $2 —+ d,[n] 


Using the fact that {71(t)| € Z} is an orthonormal basis for Wj , in conjunction with the wavelet scaling 
equation, 
Equation: 


di[n] = (zo(t), Pin(t)) 
= (Yinm Colm] yo 
=. Dan 


sm(t); b1,n(€)) 
oy Pin(t))) 


—_~ 
aS 


co[m| 
= Limm Colm] ( m), die gléle(t — £ — 2n))) 
= Limm Colm] dee a (p(t — m), p(t — £— 2n))) 
= pares co [m]g[m - 2n] 


where é[t — £— 2n] = (y(t — m), p(t — £— 2n)). 


The previous expression ([link]) indicates that {d;|n]} results from convolving {co|m]} with a time-reversed 
version of g[m] then downsampling by factor two ([link]). 


co[m] —+| G(z) |—+ 42 _—- di [n} 


Putting these two operations together, we arrive at what looks like the analysis portion of an FIR filterbank 


([link]): 


H(z!) || [2 =e [n] 
co[m] ; 
G(z7t) #4} [2 = di[n] 


We can repeat this process at the next higher level. Since V; = W2 @ Va, there exist coefficients {c2[n]} and 
{d2[n]} such that 
Equation: 


tilt) = Yin cilr] yi n(t) 
Yin d2[n\ban(t) + Donn C2lr|P2,n(t) 


Using the same steps as before we find that 
Equation: 


coln] = S> ex [m|h[m — 2n] 
Equation: 


da{n] = > e[mlglm — 2n] 


which gives a cascaded analysis filterbank ([(link]): 


e;[m] —| H(2-*) |» {2 |» op{n] 


— H(2z-*) | | 2 = G(z-!) + {2 = da{n} 


€0 [4] ——*) G(z~*) --*) 42 ~ dy{m| 


If we use Vy = W, @ W, © Wz © --- © W;, @ V;,, to repeat this process up to the k** level, we get the iterated 
analysis filterbank ([link]). 


- of » 12 ~ dia) 
cain) eo A{e-*) em 42 
alm) ee Aiea *) |e yo ~ Gla") me 12 ~ dip) 
= Hi{s—') |» 12 ™ Gi") as 12 = dain) 


As we might expect, signal reconstruction can be accomplished using cascaded two-channel synthesis filterbanks. 
Using the same assumptions as before, we have: 
Equation: 


colm] = (xo(t), Pom(t)) 

(Man Culm] Pin (t) + Lan Alr}Pin(t); Pom(t)) 

yin 1ln] (Grn (Et), Pom(t))) + Lenn A1lr] ((b1n(t), Po,m(t))) 
Yinn Cilm|alm — 2n] + do nn dilm|glm — 2n] 


I 


nn& 


where h[m — 2n] = (Y1in(t), Yom(t)) 
and glm — 2n| = (Yin(t), Pom(é)) 


which can be implemented using the block diagram in [link]. 


j-{t2|-[ He) 
)-[12/-+| Ge) 


The same procedure can be used to derive 
Equation: 


@-+ alm 


= Si cg[njh[m — 2n] +S do[n]g[m — 2n] 


from which we get the diagram in [link]. 


cp[n]—=| $2 | H(z) 


ci{m] 


dp{n]—=|t2|-+| G(z) L@—{t2b+| He) |H 


d; {m] +{t2;-=| Giz) cold] 


To reconstruct from the kt# level, we can use the iterated synthesis filterbank ((link]). 


exim] —12—= Hie) 
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The table makes a direct comparison between wavelets and the two-channel orthogonal PR-FIR filterbanks. 


Discrete Wavelet Transform 2-Channel Orthogonal PR-FIR Filterbank 
Analysis- = 
ie H(z ) H(z) 
aus H(z)H(z"!) + H(-z)H(-2z71) =2 Ho(z)Ho(z-1) + Ho(—z)Ho(—z7 


Symmetry 


Discrete Wavelet Transform 2-Channel Orthogonal PR-FIR Filterbank 


or | Oe") (2) 

Sa VP, P is odd : (G(z) = +(z-? H(-z7))) VN, Nis even : (Hi(z) = +(z-8-) Hy (- 
ae H(z) Go(z) = 22° 8-D Hy (27) 

Sa G(z) Gi(z) = 22-N-)) A, (2) 


From the table, we see that the discrete wavelet transform that we have been developing is identical to two-channel 
orthogonal PR-FIR filterbanks in all but a couple details. 


1. Orthogonal PR-FIR filterbanks employ synthesis filters with twice the gain of the analysis filters, whereas in 
the DWT the gains are equal. 

2. Orthogonal PR-FIR filterbanks employ causal filters of length N, whereas the DWT filters are not 
constrained to be causal. 


For convenience, however, the wavelet filters H(z) and G(z) are usually chosen to be causal. For both to have 
even impulse response length N, we require that P = N — 1. 


DWT Application - De-noising 


Say that the DWT for a particular choice of wavelet yields an efficient 
representation of a particular signal class. In other words, signals in the 
class are well-described using a few large transform coefficients. 


Now consider unstructured noise, which cannot be eifficiently represented 
by any transform, including the DWT. Due to the orthogonality of the 
DWT, such noise sequences make, on average, equal contributions to all 
transform coefficients. Any given noise sequence is expected to yield many 
small-valued transform coefficients. 


Together, these two ideas suggest a means of de-noising a signal. Say that 
we perform a DWT on a signal from our well-matched signal class that has 
been corrupted by additive noise. We expect that large transform 
coefficients are composed mostly of signal content, while small transform 
coefficients should be composed mostly of noise content. Hence, throwing 
away the transform coefficients whose magnitude is less than some small 
threshold should improve the signal-to-noise ratio. The de-noising 
procedure is illustrated in [link]. 


noisy signal —+{ pwr | +lthreshoid threshold _IDWT | i—= de-noised signal 


Now we give an example of denoising a step-like waveform using the Haar 
DWT. In [link], the top right subplot shows the noisy signal and the top left 
shows it DWT coefficients. Note the presence of a few large DWT 
coefficients, expected to contain mostly signal components, as well as the 
presence of many small-valued coefficients, expected to contain noise. (The 
bottom left subplot shows the DWT for the original signal before any noise 
was added, which confirms that all signal energy is contained within a few 
large coefficients.) If we throw away all DWT coefficients whose 
magnitude is less than 0.1, we are left with only the large coefficients 
(shown in the middle left plot) which correspond to the de-noised time- 
domain signal shown in the middle right plot. The difference between the 
de-noised signal and the original noiseless signal is shown in the bottom 
right. Non-zero error results from noise contributions to the large 
coefficients; there is no way of distinguishing these noise components from 
signal components. 
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Introduction to Random Signals and Processes 


Before now, you have probably dealt strictly with the theory behind signals 
and systems, as well as look at some the basic characteristics of signals and 
systems. In doing so you have developed an important foundation; however, 
most electrical engineers do not get to work in this type of fantasy world. In 
many cases the signals of interest are very complex due to the randomness 
of the world around them, which leaves them noisy and often corrupted. 
This often causes the information contained in the signal to be hidden and 
distorted. For this reason, it is important to understand these random signals 
and how to recover the necessary information. 


Signals: Deterministic vs. Stochastic 


For this study of signals and systems, we will divide signals into two 
groups: those that have a fixed behavior and those that change randomly. As 
most of you have probably already dealt with the first type, we will focus 
on introducing you to random signals. Also, note that we will be dealing 
strictly with discrete-time signals since they are the signals we deal with in 
DSP and most real-world computations, but these same ideas apply to 
continuous-time signals. 


Deterministic Signals 


Most introductions to signals and systems deal strictly with deterministic 
signals. Each value of these signals are fixed and can be determined by a 
mathematical expression, rule, or table. Because of this, future values of 
any deterministic signal can be calculated from past values. For this reason, 
these signals are relatively easy to analyze as they do not change, and we 
can make accurate assumptions about their past and future behavior. 
Deterministic Signal 


An example of a deterministic signal, the sine wave. 


Stochastic Signals 


Unlike deterministic signals, stochastic signals, or random signals, are not 
so nice. Random signals cannot be characterized by a simple, well-defined 
mathematical equation and their future values cannot be predicted. Rather, 
we must use probability and statistics to analyze their behavior. Also, 
because of their randomness, average values from a collection of signals are 
usually studied rather than analyzing one individual signal. 
Random Signal 
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We have taken the above sine wave and added random 
noise to it to come up with a noisy, or random, signal. 
These are the types of signals that we wish to learn how 
to deal with so that we can recover the original sine 
wave. 


Random Process 


As mentioned above, in order to study random signals, we want to look at a 
collection of these signals rather than just one instance of that signal. This 
collection of signals is called a random process. 


random process 


A family or ensemble of signals that correspond to every possible 
outcome of a certain signal measurement. Each signal in this collection 
is referred to as a realization or sample function of the process. 


Example: 

As an example of a random process, let us look at the Random Sinusoidal 
Process below. We use f[n] = Asin(wn + y) to represent the sinusoid 
with a given amplitude and phase. Note that the phase and amplitude of 
each sinusoid is based on a random number, thus making this a random 
process. 


Random Sinusoidal Process 
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A random sinusoidal process, with the amplitude and 
phase being random numbers. 


A random process is usually denoted by X(t) or X[n], with x(t) or x[n] 
used to represent an individual signal or waveform from this process. 


In many notes and books, you might see the following notation and terms 
used to describe different types of random processes. For a discrete 
random process, sometimes just called a random sequence, ¢ represents 
time that has a finite number of values. If ¢ can take on any value of time, 
we have a continuous random process. Often times discrete and 
continuous refer to the amplitude of the process, and process or sequence 
refer to the nature of the time variable. For this study, we often just use 
random process to refer to a general collection of discrete-time signals, as 
seen above in [link]. 


Stationary and Nonstationary Random Processes 


Introduction 


From the definition of a random process, we know that all random 
processes are composed of random variables, each at its own unique point 
in time. Because of this, random processes have all the properties of 
random variables, such as mean, correlation, variances, etc.. When dealing 
with groups of signals or sequences it will be important for us to be able to 
show whether of not these statistical properties hold true for the entire 
random process. To do this, the concept of stationary processes has been 
developed. The general definition of a stationary process is: 


stationary process 
a random process where all of its statistical properties do not vary with 
time 


Processes whose statistical properties do change are referred to as 
nonstationary. 


Understanding the basic idea of stationarity will help you to be able to 
follow the more concrete and mathematical definition to follow. Also, we 
will look at various levels of stationarity used to describe the various types 
of stationarity characteristics a random process can have. 


Distribution and Density Functions 


In order to properly define what it means to be stationary from a 
mathematical standpoint, one needs to be somewhat familiar with the 
concepts of distribution and density functions. If you can remember your 
Statistics then feel free to skip this section! 


Recall that when dealing with a single random variable, the probability 
distribution function is a simply tool used to identify the probability that 
our observed random variable will be less than or equal to a given number. 
More precisely, let be our random variable, and let be our given value; 
from this we can define the distribution function as 


Equation: 


This same idea can be applied to instances where we have multiple random 
variables as well. There may be situations where we want to look at the 
probability of event and __ both occurring. For example, below is an 
example of a second-order joint distribution function. 

Equation: 


While the distribution function provides us with a full view of our variable 
or processes probability, it is not always the most useful for calculations. 
Often times we will want to look at its derivative, the probability density 
function (pdf). We define the the pdf as 

Equation: 


Equation: 


[link] reveals some of the physical significance of the density function. This 
equations tells us the probability that our random variable falls within a 
given interval can be approximated by . From the pdf, we can now 
use our knowledge of integrals to evaluate probabilities from the above 
approximation. Again we can also define a joint density function which 
will include multiple random variables just as was done for the distribution 
function. The density function is used for a variety of calculations, such as 
finding the expected value or proving a random variable is stationary, to 
name a few. 


Note: The above examples explain the distribution and density functions in 
terms of a single random variable, . When we are dealing with signals 
and random processes, remember that we will have a set of random 
variables where a different random variable will occur at each time 
instance of the random process, . In other words, the distribution and 
density function will also need to take into account the choice of time. 


Stationarity 


Below we will now look at a more in depth and mathematical definition of 
a stationary process. As was mentioned previously, various levels of 
stationarity exist and we will look at the most common types. 


First-Order Stationary Process 


A random process is classified as first-order stationary if its first-order 
probability density function remains equal regardless of any shift in time to 
its time origin. If we let represent a given value at time , then we 
define a first-order stationary as one that satisfies the following equation: 
Equation: 


The physical significance of this equation is that our density function, 
, is completely independent of and thus any time shift, 


The most important result of this statement, and the identifying 
characteristic of any first-order stationary process, is the fact that the mean 
is a constant, independent of any time shift. Below we show the results for a 
random process, _ , that is a discrete-time signal, 

Equation: 


Second-Order and Strict-Sense Stationary Process 


A random process is classified as second-order stationary if its second- 
order probability density function does not vary over any time shift applied 
to both values. In other words, for values and __ then we will have the 
following be equal for an arbitrary time shift 

Equation: 


From this equation we see that the absolute time does not affect our 
functions, rather it only really depends on the time difference between the 
two variables. Looked at another way, this equation can be described as 
Equation: 


These random processes are often referred to as strict sense stationary 
(SSS) when all of the distribution functions of the process are unchanged 
regardless of the time shift applied to them. 


For a second-order stationary process, we need to look at the 
autocorrelation function to see its most important property. Since we have 
already stated that a second-order stationary process depends only on the 
time difference, then all of these types of processes have the following 
property: 

Equation: 


Wide-Sense Stationary Process 


As you begin to work with random processes, it will become evident that 
the strict requirements of a SSS process is more than is often necessary in 
order to adequately approximate our calculations on random processes. We 
define a final type of stationarity, referred to as wide-sense stationary 
(WSS), to have slightly more relaxed requirements but ones that are still 
enough to provide us with adequate results. In order to be WSS a random 
process only needs to meet the following two requirements. 
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Note that a second-order (or SSS) stationary process will always be WSS; 
however, the reverse will not always hold true. 


Random Processes: Mean and Variance 


In order to study the characteristics of a random process, let us look at some 
of the basic properties and operations of a random process. Below we will 
focus on the operations of the random signals that compose our random 
processes. We will denote our random process with X and a random 
variable from a random process or signal by z. 


Mean Value 


Finding the average value of a set of random signals or random variables is 
probably the most fundamental concepts we use in evaluating random 
processes through any sort of statistical method. The mean of a random 
process is the average of all realizations of that process. In order to find 
this average, we must look at a random signal over a range of time (possible 
values) and determine our average from this set of values. The mean, or 
average, of a random process, x(t), is given by the following equation: 
Equation: 


ma(t) = pMe(t) 


This equation may seem quite cluttered at first glance, but we want to 
introduce you to the various notations used to represent the mean of a 
random signal or process. Throughout texts and other readings, remember 
that these will all equal the same thing. The symbol, ,(¢), and the X with 
a bar over it are often used as a short-hand to represent an average, so you 
might see it in certain textbooks. The other important notation used is, 
E|X], which represents the "expected value of X" or the mathematical 
expectation. This notation is very common and will appear again. 


If the random variables, which make up our random process, are discrete or 
quantized values, such as in a binary process, then the integrals become 


summations over all the possible values of the random variable. In this case, 
our expected value becomes 
Equation: 


E[z[n]] = S— aPr[x[n] = a] 


If we have two random signals or variables, their averages can reveal how 
the two signals interact. If the product of the two individual averages of 
both signals do not equal the average of the product of the two signals, then 
the two signals are said to be linearly independent, also referred to as 
uncorrelated. 


In the case where we have a random process in which only one sample can 
be viewed at a time, then we will often not have all the information 
available to calculate the mean using the density function as shown above. 
In this case we must estimate the mean through the time-average mean, 
discussed later. For fields such as signal processing that deal mainly with 
discrete signals and values, then these are the averages most commonly 
used. 


Properties of the Mean 


e The expected value of a constant, a, is the constant: 
Equation: 


Ela| =a 


e Adding a constant, a, to each term increases the expected value by that 
constant: 
Equation: 


E[X +a] = E[X] +a 


e Multiplying the random variable by a constant, a, multiplies the 
expected value by that constant. 


Equation: 
ElaX] = aE X| 


e The expected value of the sum of two or more random variables, is the 
sum of each individual expected value. 
Equation: 


E|X +Y] = E[X]+ Ely] 


Mean-Square Value 


If we look at the second moment of the term (we now look at x? in the 
integral), then we will have the mean-square value of our random process. 
As you would expect, this is written as 

Equation: 


X2 


EX? 
= [Pa fade 


This equation is also often referred to as the average power of a process or 
signal. 


Variance 


Now that we have an idea about the average value or values that a random 
process takes, we are often interested in seeing just how spread out the 
different random values might be. To do this, we look at the variance 
which is a measure of this spread. The variance, often denoted by 0, is 
written as follows: 

Equation: 
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Using the rules for the expected value, we can rewrite this formula as the 
following form, which is commonly seen: 


Equation: 
2 
o Xx? _— (x) 


= E[X?] — (E[X])’ 


| 


Standard Deviation 


Another common statistical tool is the standard deviation. Once you know 
how to calculate the variance, the standard deviation is simply the square 
root of the variance, or o. 


Properties of Variance 


e The variance of a constant, a, equals zero: 
Equation: 


Var(a) = o(a)’ 
= 0 
e Adding a constant, a, to a random variable does not affect the variance 


because the mean increases by the same value: 
Equation: 


Var(X +a) = o(X+a)’ 
= o(X)’ 

e Multiplying the random variable by a constant, a, increases the 

variance by the square of the constant: 

Equation: 

Var (aX) = o(aX)’ 
a2o(X)? 

e The variance of the sum of two random variables only equals the sum 


of the variances if the variable are independent. 
Equation: 


Var (X + Y) 


o(X+Y)’ 
= 0(X)?+0(Y)? 


Otherwise, if the random variable are not independent, then we must 
also include the covariance of the product of the variables as follows: 
Equation: 


Var (X + Y) = 0(X)* + 2Cov(X,Y) +0(Y)’ 


Time Averages 


In the case where we can not view the entire ensemble of the random 
process, we must use time averages to estimate the values of the mean and 
variance for the process. Generally, this will only give us acceptable results 
for independent and ergodic processes, meaning those processes in which 
each signal or member of the process seems to have the same statistical 
behavior as the entire process. The time averages will also only be taken 
over a finite interval since we will only be able to see a finite part of the 
sample. 


Estimating the Mean 


For the ergodic random process, x(t), we will estimate the mean using the 
time averaging function defined as 
Equation: 


X = EX] 
Se ae Gace: 


However, for most real-world situations we will be dealing with discrete 
values in our computations and signals. We will represent this mean as 
Equation: 


Estimating the Variance 


Once the mean of our random process has been estimated then we can 
simply use those values in the following variance equation (introduced in 
one of the above sections) 

Equation: 


Example 


Let us now look at how some of the formulas and concepts above apply to a 
simple example. We will just look at a single, continuous random variable 
for this example, but the calculations and methods are the same for a 


random process. For this example, we will consider a random variable 


having the probability density function described below and shown in 
[link]. 


Equation: 
1 e 
0 otherwise 


Probability Density Function 
f( x) 
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A uniform probability density 
function. 


First, we will use [link] to solve for the mean value. 
Equation: 
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Using [link] we can obtain the mean-square value for the above density 
function. 


Equation: 


a i ai dz 
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10 
= 233.33 
And finally, let us solve for the variance of this function. 


Equation: 
2 
ot = X*- (x) 


= 233.33 — 15? 
8.33 


Correlation and Covariance of a Random Signal 


When we take the expected value, or average, of a random process, we 
measure several important characteristics about how the process behaves in 
general. This proves to be a very important observation. However, suppose 
we have several random processes measuring different aspects of a system. 
The relationship between these different processes will also be an important 
observation. The covariance and correlation are two important tools in 
finding these relationships. Below we will go into more details as to what 
these words mean and how these tools are helpful. Note that much of the 
following discussions refer to just random variables, but keep in mind that 
these variables can represent random signals or random processes. 


Covariance 


To begin with, when dealing with more than one random process, it should 
be obvious that it would be nice to be able to have a number that could 
quickly give us an idea of how similar the processes are. To do this, we use 
the covariance, which is analogous to the variance of a single variable. 


Covariance 
A measure of how much the deviations of two or more variables or 
processes match. 


For two processes, X and Y, if they are not closely related then the 
covariance will be small, and if they are similar then the covariance will be 
large. Let us clarify this statement by describing what we mean by "related" 
and "similar." Two processes are "closely related" if their distribution 
spreads are almost equal and they are around the same, or a very slightly 
different, mean. 


Mathematically, covariance is often written as o,, and is defined as 
Equation: 


cov(A,;Y) = Oy 


BE X-X Y-Y 


This can also be reduced and rewritten in the following two forms: 
Equation: 


Ory = (xy) _ xy 


Equation: 


Ga, = X—-X Y-Y f(a,y)dady 


—oo —oo 


Useful Properties 


e If X and Y are independent and uncorrelated or one of them has zero 
mean value, then 


Ory = 0 


If X and Y are orthogonal, then 
Ory = — (E[XJE[Y]) 
e The covariance is symmetric 
cov (X,Y) = cov(Y, X) 
e Basic covariance identity 
cov (X + Y, Z) = cov(X, Z) + cov (Y, Z) 
e Covariance of equal variables 


cov (X, X) = Var (X) 


Correlation 


For anyone who has any kind of statistical background, you should be able 
to see that the idea of dependence/independence among variables and 
signals plays an important role when dealing with random processes. 
Because of this, the correlation of two variables provides us with a 
measure of how the two variables affect one another. 


Correlation 
A measure of how much one random variable depends upon the other. 


This measure of association between the variables will provide us with a 
clue as to how well the value of one variable can be predicted from the 
value of the other. The correlation is equal to the average of the product of 
two random variables and is defined as 

Equation: 


cor(xX,Y) = E|XY] 


= % *cyf(x,y)dady 


Correlation Coefficient 


It is often useful to express the correlation of random variables with a range 
of numbers, like a percentage. For a given set of variables, we use the 
correlation coefficient to give us the linear relationship between our 
variables. The correlation coefficient of two variables is defined in terms of 
their covariance and standard deviations, denoted by az, as seen below 
Equation: 


_ cov(X,Y) 


O70 y 


where we will always have 


=e ps1 


This provides us with a quick and easy way to view the correlation between 
our variables. If there is no relationship between the variables then the 
correlation coefficient will be zero and if there is a perfect positive match it 
will be one. If there is a perfect inverse relationship, where one set of 
variables increases while the other decreases, then the correlation 
coefficient will be negative one. This type of correlation is often referred to 
more specifically as the Pearson's Correlation Coefficient,or Pearson's 
Product Moment Correlation. 


Positive Negative Uncorrelated (No 
Correlation Correlation Correlation) 
variable 2 variable 2 variable 2 


variable 1] variable 1 


Variable ] 


Types of Correlation 


Note: So far we have dealt with correlation simply as a number relating the 
relationship between any two variables. However, since our goal will be to 
relate random processes to each other, which deals with signals as a 
function of time, we will want to continue this study by looking at 
correlation functions. 


Example 


Now let us take just a second to look at a simple example that involves 
calculating the covariance and correlation of two sets of random numbers. 
We are given the following data sets: 


x = {3,1,6,3,4} 
y = {1,5, 3, 4, 3} 


To begin with, for the covariance we will need to find the expected value, or 
mean, of x and y. 


e= —(3+1+6+3+4)=3.4 


y= —(14+54+344+4+3) =3.2 
1 
ry = ~ (3 +5+18+12+ 12) = 10 


Next we will solve for the standard deviations of our two sets using the 
formula below (for a review click here). 


o= E (X-E{X))’ 
1 
Tr= = (0.16 + 5.76 + 6.76 + 0.16 + 0.36) = 1.625 


1 
oy=  & (4.84 + 3.24 + 0.04 + 0.64 + 0.04) = 1.327 


Now we can finally calculate the covariance using one of the two formulas 
found above. Since we calculated the three means, we will use that formula 
since it will be much simpler. 


Ory = 10 —3.4 x 3.2 = —0.88 


And for our last calculation, we will solve for the correlation coefficient, p. 


—0.88 


= ———..~ = — 0.408 
1.625: x 1.327 


p 


Matlab Code for Example 


The above example can be easily calculated using Matlab. Below I have 
included the code to find all of the values above. 


x = [3 1 6 3 4]; 
y= [15 3 4 3]; 
mx = mean(x) 
my = mean(y) 
mxy = mean(x. *y) 


% Standard Dev. from built-in Matlab 
Functions 

std(x,1) 

std(y,1) 


% Standard Dev. from Equation Above 
(same result as std(?,1)) 

sqrt( 1/5 * sum((x-mx).42)) 

sqrt( 1/5 * sum((y-my).42)) 

cov(x,y,1) 


corrcoef(x,y) 


Autocorrelation of Random Processes 


Before diving into a more complex statistical analysis of random signals 
and processes, let us quickly review the idea of correlation. Recall that the 
correlation of two signals or variables is the expected value of the product 
of those two variables. Since our focus will be to discover more about a 
random process, a collection of random signals, then imagine us dealing 
with two samples of a random process, where each sample is taken at a 
different point in time. Also recall that the key property of these random 
processes is that they are now functions of time; imagine them as a 
collection of signals. The expected value of the product of these two 
variables (or samples) will now depend on how quickly they change in 
regards to time. For example, if the two variables are taken from almost the 
same time period, then we should expect them to have a high correlation. 
We will now look at a correlation function that relates a pair of random 
variables from the same process to the time separations between them, 
where the argument to this correlation function will be the time difference. 
For the correlation of signals from two different random process, look at the 
crosscorrelation function. 


Autocorrelation Function 


The first of these correlation functions we will discuss is the 
autocorrelation, where each of the random variables we will deal with 
come from the same random process. 


Autocorrelation 
the expected value of the product of a random variable or signal 
realization with a time-shifted version of itself 


With a simple calculation and analysis of the autocorrelation function, we 
can discover a few important characteristics about our random process. 
These include: 


1. How quickly our random signal or processes changes with respect to 
the time function 


2. Whether our process has a periodic component and what the expected 
frequency might be 


As was mentioned above, the autocorrelation function is simply the 
expected value of a product. Assume we have a pair of random variables 
from the same process, X; = X(t) and X2 = X(t2), then the 
autocorrelation is often written as 

Equation: 


Rx(t1, t2) E|X1X9| 


= fo fe eieef (a1, 22) da day 


The above equation is valid for stationary and nonstationary random 
processes. For stationary processes, we can generalize this expression a 
little further. Given a wide-sense stationary processes, it can be proven that 
the expected values from our random process will be independent of the 
origin of our time function. Therefore, we can say that our autocorrelation 
function will depend on the time difference and not some absolute time. For 
this discussion, we will let 7 = t2 — t;, and thus we generalize our 
autocorrelation expression as 

Equation: 


Rax (t,t +7) = Rex(7) 
= E[X(t)X(t+7)| 


for the continuous-time case. In most DSP course we will be more 
interested in dealing with real signal sequences, and thus we will want to 
look at the discrete-time case of the autocorrelation function. The formula 
below will prove to be more common and useful than [link]: 

Equation: 


And again we can generalize the notation for our autocorrelation function as 
Equation: 


Ryex[n,n+m) = Ry[m] 
= E|X[n|X[n+ ml] 


Properties of Autocorrelation 


Below we will look at several properties of the autocorrelation function that 
hold for stationary random processes. 


e Autocorrelation is an even function for T 


e The mean-square value can be found by evaluating the autocorrelation 
where 7 = O, which gives us 


Ryx(0) = X? 


e The autocorrelation function will have its largest value when 7 = 0. 
This value can appear again, for example in a periodic function at the 
values of the equivalent periodic points, but will never be exceeded. 


Rxx(0) 2 |Rxx(7)| 


e If we take the autocorrelation of a period function, then R,x(7) will 
also be periodic with the same frequency. 


Estimating the Autocorrleation with Time-Averaging 


Sometimes the whole random process is not available to us. In these cases, 
we would still like to be able to find out some of the characteristics of the 


stationary random process, even if we just have part of one sample function. 
In order to do this we can estimate the autocorrelation from a given 
interval, 0 to 7’ seconds, of the sample function. 

Equation: 


However, a lot of times we will not have sufficient information to build a 
complete continuous-time function of one of our random signals for the 
above analysis. If this is the case, we can treat the information we do know 
about the function as a discrete signal and use the discrete-time formula for 
estimating the autocorrelation. 


Equation: 
1 N-m-1 
<n) = = Qe z|n|z[n + m] 
Examples 


Below we will look at a variety of examples that use the autocorrelation 
function. We will begin with a simple example dealing with Gaussian White 
Noise (GWN) and a few basic statistical properties that will prove very 
useful in these and future calculations. 


Example: 
We will let x|n] represent our GWN. For this problem, it is important to 
remember the following fact about the mean of a GWN function: 


E|z|n|| = 0 


x[n] 


Gaussian density 
function. By 
examination, can 
easily see that 
the above 
statement is true 
- the mean equals 
Zero. 


Along with being zero-mean, recall that GWN is always independent. 
With these two facts, we are now ready to do the short calculations 
required to find the autocorrelation. 


Ryx|[n, n+ m] = Ela[n|a2[n + m]] 


Since the function, x|n], is independent, then we can take the product of 
the individual expected values of both functions. 


Ryx[n,n +m] = Elz|[n||Elae[n + ml] 


Now, looking at the above equation we see that we can break it up further 
into two conditions: one when m and n are equal and one when they are 
not equal. When they are equal we can combine the expected values. We 
are left with the following piecewise function to solve: 


2 Elz([n||E£|z[n + mj] if m 40 

sl a2 f al) Ae noel 
We can now solve the two parts of the above equation. The first equation is 
easy to solve as we have already stated that the expected value of x[n] will 
be zero. For the second part, you should recall from statistics that the 


expected value of the square of a function is equal to the variance. Thus we 
get the following results for the autocorrelation: 


Ot in) 


R,|n,n +m] = 
| | - bb afl) 


Or in a more concise way, we can represent the results as 


eit. es m| ao oa? 5[m] 


Crosscotrelation of Random Processes 


Before diving into a more complex statistical analysis of random signals 
and processes, let us quickly review the idea of correlation. Recall that the 
correlation of two signals or variables is the expected value of the product 
of those two variables. Since our main focus is to discover more about 
random processes, a collection of random signals, we will deal with two 
random processes in this discussion, where in this case we will deal with 
samples from two different random processes. We will analyze the 
expected value of the product of these two variables and how they correlate 
to one another, where the argument to this correlation function will be the 
time difference. For the correlation of signals from the same random 
process, look at the autocorrelation function. 


Crosscorrelation Function 


When dealing with multiple random processes, it is also important to be 
able to describe the relationship, if any, between the processes. For 
example, this may occur if more than one random signal is applied to a 
system. In order to do this, we use the crosscorrelation function, where the 
variables are instances from two different wide sense stationary random 
processes. 


Crosscorrelation 
if two processes are wide sense stationary, the expected value of the 
product of a random variable from one random process with a time- 
shifted, random variable from a different random process 


Looking at the generalized formula for the crosscorrelation, we will 
represent our two random processes by allowing U = U(t) and 

V = V(t — 7). We will define the crosscorrelation function as 
Equation: 


Rw(t,t—7) = EUV] 
ff urf(u,v) dvdu 


Just as the case with the autocorrelation function, if our input and output, 
denoted as U(t) and V(t), are at least jointly wide sense stationary, then the 
crosscorrelation does not depend on absolute time; it is just a function of the 
time difference. This means we can simplify our writing of the above 
function as 

Equation: 


Ruy(7) = E[UV] 


or if we deal with two real signal sequences, x{n] and y|n], then we arrive 
at a more commonly seen formula for the discrete crosscorrelation function. 
See the formula below and notice the similarities between it and the 
convolution of two signals: 

Equation: 


Ryy(nm—m) = Rey(m) 
= DP elnlyln — my] 


Properties of Crosscorrelation 


Below we will look at several properties of the crosscorrelation function 
that hold for two wide sense stationary (WSS) random processes. 


e Crosscorrelation is not an even function; however, it does have a 
unique symmetry property: 
Equation: 


Ray(—T) = Rye(7) 


e The maximum value of the crosscorrelation is not always when the 
shift equals zero; however, we can prove the following property 
revealing to us what value the maximum cannot exceed. 
Equation: 


IRry(7)| < — Rrz(0)Ry,(0) 


e When two random processes are statistically independent then we have 
Equation: 


Ray(T) = Rye(7) 


Examples 


Exercise: 


Problem: 


Let us begin by looking at a simple example showing the relationship 
between two sequences. Using [link], find the crosscorrelation of the 
sequences 


ge) = 4.25 0, 0,2, —3,6,.153,0, 0402} 
y[n] = {...,0,0, 1, —2, 4, 1, —3,0,0,...} 
for each of the following possible time shifts: m = {0, 3, —1}. 


Solution: 


1. For m = 0, we should begin by finding the product sequence 
s[n] = x[n]y|[n]. Doing this we get the following sequence: 


sin] = {...,0,0,2, 6, 24, 1, —9,0,0,...} 


and so from the sum in our crosscotrrelation function we arrive at 
the answer of 


R,,(0) = 22 


2. For m = 3, we will approach it the same was we did above; 
however, we will now shift y|n] to the right. Then we can find the 


product sequence s{n] = x|[n]|y|n — 3], which yields 
s|n] = {...,0,0,0,0,0,1, —6,0,0,...} 
and from the crosscorrelation function we atrive at the answer of 
RaQ) = -6 


3. Form = —1, we will again take the same approach; however, we 
will now shift y[7] to the left. Then we can find the product 
sequence s(n] = z|n]y|n + 1], which yields 


s|n] = 1.x iec0; —A4, —12, 6, —3, 0,0,0,...} 
and from the crosscorrelation function we arrive at the answer of 


Ae 13 


Introduction to Adaptive Filters 


In many applications requiring filtering, the necessary frequency response 
may not be known beforehand, or it may vary with time. (Example; 
suppression of engine harmonics in a car stereo.) In such applications, an 
adaptive filter which can automatically design itself and which can track 
system variations in time is extremely useful. Adaptive filters are used 
extensively in a wide variety of applications, particularly in 
telecommunications. 

Outline of adaptive filter material 


1. Wiener Filters 2 optimal (FIR) filter design in a statistical context 

2. LMS algorithm simplest and by-far-the-most-commonly-used 
adaptive filter algorithm 

3. Stability and performance of the LMS algorithm When and how 
well it works 

4. Applications of adaptive filters Overview of important applications 

5. Introduction to advanced adaptive filter algorithms Techniques for 
special situations or faster convergence 


Discrete-Time, Causal Wiener Filter 


Stochastic D2 optimal (least squares) FIR filter design problem: Given a wide-sense stationary (WSS) input signal 
x, and desired signal dy (WSS @ Ely] = Elyx+al, ryz(l) = Elyeznsil, Vk, 1: (ryy(0) < 00)) 


The Wiener filter is the linear, time-invariant filter minimizing E [e?] , the variance of the error. 


As posed, this problem seems slightly silly, since d; is already available! However, this idea is useful in a wide 
cariety of applications. 


Example: 
active suspension system design 


(en «a— y= level of car body ... qd. = constant 


oy W = suspension system 


av Av Ave a ene Ml aoa x, = road level 


Note:optimal system may change with different road conditions or mass in car, so an adaptive system might be 
desirable. 


Example: 
System identification (radar, non-destructive testing, adaptive control systems) 


Exercise: 


Problem: 


Usually one desires that the input signal x; be "persistently exciting," which, among other things, implies 
non-zero energy in all frequency bands. Why is this desirable? 


Determining the optimal length-N causal FIR Weiner filter 


Note: for convenience, we will analyze only the causal, real-data case; extensions are straightforward. 


M-1 
YR = y WILk-l 
1=0 


M-1 Ss M-1 M-1M-1 
argmin E(e”] = B| (4 = vi) | =E (« = S vin] a Ed,” | —2 SS wi E|dprp—1] + ye SS (w; 
1=0 1=0 1=0 m=0 


M-1 M-1M-1 
Ele? = raa(0) —2 x Witdx(l) + > >», WIW ml xx(1 aa m) 
10 


1=0 m=0 
where 


raa(0) = E[{dx”] 
Tax (I) a E\d,X,-1| 
Txx(l— m) = Expr psi—m!| 


This can be written in matrix form as 


Ele”] =raa(0) -2PW" +W7 RW 
where 


Tax(0) 
Tax(1) 
P= 
Tax(M = 1) 
Txx(0)  TPxx(1) Txx(M — 1) 
Teel b) 1 xx(0) 
R= 
: es (5) Fest L) 
Txx(M-1)... 


Txx(1) Txx(0) 
To solve for the optimum filter, compute the gradient with respect to the top weights vector W 


V=-—(2P)+2RW 
(recall ar (ATW) = At, atv (WM W) = 2MW for symmetric M) setting the gradient equal to zero > 
WotR = P > Wop = RP 


Since R is a correlation matrix, it must be non-negative definite, so this is a minimizer. For R positive definite, the 
minimizer is unique. 


Practical Issues in Wiener Filter Implementation 


The weiner-filter, Wo» = R~'P, is ideal for many applications. But 


several issues must be addressed to use it in practice. 
Exercise: 


Problem: 


In practice one usually won't know exactly the statistics of x; and d; 
(i.e. R and P) needed to compute the Weiner filter. 


How do we surmount this problem? 
Solution: 


Estimate the statistics 


{No 
T(t} = TT LEL EI 
N 4 0 
1 N-1 
rxa(l) ~ — a 
N 40 


then solve Wo, = R-' = P 
Exercise: 


Problem: 
In many applications, the statistics of x,, dy, vary slowly with time. 


How does one develop an adaptive system which tracks these changes 
over time to keep the system near optimal at all times? 


Solution: 


Use short-time windowed estiamtes of the correlation functions. 


Note: 


() k 1 N-1 
Tox(l) == Lk—-m&Lk—m-—l 
N m 0 
k 1 N-1 
ta(). = WV iy ee 18 
m 0 


and Wop ~ Re Px 


Exercise: 


Problem: How can r*, (1) be computed efficiently? 
Solution: 


Recursively! 


rk (1) = fa) + LELE-1 — Lk-NXLk-N-1 


This is critically stable, so people usually do 


Exercise: 


Problem: how does one choose N? 


Tradeoffs 


Larger NV — more accurate estimates of the correlation values — better 
W opt. However, larger NV leads to slower adaptation. 


Note:The success of adaptive systems depends on z, d being roughly 
Stationary over at least NV samples, N > M. That is, all adaptive filtering 
algorithms require that the underlying system varies slowly with respect to 
the sampling rate and the filter length (although they can tolerate 
occasional step discontinuities in the underlying system). 


Computational Considerations 


As presented here, an adaptive filter requires computing a matrix inverse at 
each sample. Actually, since the matrix R is Toeplitz, the linear system of 
equations can be sovled with O M? computations using Levinson's 


algorithm, where / is the filter length. However, in many applications this 
may be too expensive, especially since computing the filter output itself 
requires O(M/) computations. There are two main approaches to resolving 
the computation problem 


1. Take advantage of the fact that R*+! is only slightly changed from R* 
to reduce the computation to OM); these algorithms are called Fast 
Recursive Least Squareds algorithms; all methods proposed so far 
have stability problems and are dangerous to use. 

2. Find a different approach to solving the optimization problem that 
doesn't require explicit inversion of the correlation matrix. 


Note:Adaptive algorithms involving the correlation matrix are called 
Recursive least Squares (RLS) algorithms. Historically, they were 
developed after the LMS algorithm, which is the slimplest and most widely 
used approach O(M).O M? RLS algorithms are used in applications 


requiring very fast adaptation. 


Quadratic Minimization and Gradient Descent 


Quadratic minimization problems 


The least squares optimal filter design problem is quadratic in the filter 
coefficients: 


Ele | T £ cR 


If R is positive definite, the error surface & le | w w wu isa 
N 


unimodal "bowl" in 


The problem is to find the bottom of the bowl. In an adaptive filter context, 
the shape and bottom of the bowl may drift slowly with time; hopefully 
slow enough that the adaptive algorithm can track it. 


For a quadratic error surface, the bottom of the bowl can be found in one 
step by computing R . Most modern nonlinear optimization methods 
(which are used, for example, to solve the i optimal IIR filter design 
problem!) locally approximate a nonlinear function with a second-order 
(quadratic) Taylor series approximation and step to the bottom of this 
quadratic approximation on each iteration. However, an older and simpler 
appraoch to nonlinear optimaztion exists, based on gradient descent. 
Contour plot of ¢-squared 


The idea is to iteratively find the minimizer by computing the gradient of 
the error function: & as The gradient is a vectorin ™ pointing 
in the steepest uphill direction on the error surface at a given point  *, 
with having a magnitude proportional to the slope of the error surface in 


this steepest direction. 


By updating the coefficient vector by taking a step opposite the gradient 
direction: ‘ ‘pw *, we go (locally) "downhill" in the steepest 
direction, which seems to be a sensible way to iteratively solve a nonlinear 
optimization problem. The performance obviously depends on py; if pz is too 
large, the iterations could bounce back and forth up out of the bowl. 
However, if yz is too small, it could take many iterations to approach the 
bottom. We will determine criteria for choosing y later. 


In summary, the gradient descent algorithm for solving the Weiner filter 
problem is: 


The gradient descent idea is used in the LMS adaptive fitler algorithm. As 
presented, this alogrithm costs O (M ) computations per iteration and 
doesn't appear very attractive, but LMS only requires O M computations 
and is stable, so it is very attractive when computation is an issue, even 
thought it converges more slowly then the RLS algorithms we have 
discussed so far. 


The LMS Adaptive Filter Algorithm 


Recall the Weiner filter problem 


{xx}, {d,} jointly wide sense stationary 


Find W minimizing & lex? | 


The superscript denotes absolute time, and the subscript denotes time or a 
vector index. 


the solution can be found by setting the gradient 0 
Equation: 


ze _ 9Elex’| 
Vn ow 


= E[2ex (—X"*)| 
= B|-2 (a, = x" wW,)X*| 


= (28 [d.X*)) + E[x*")w 
= 2P+2RW 


=> (Woot a eee) 


Alternatively, Wp, can be found iteratively using a gradient descent 
technique 


w+! — wk— pve 


In practice, we don't know R and P exactly, and in an adaptive context they 
may be slowly varying with time. 


To find the (approximate) Wiener filter, some approximations are necessary. 
As always, the key is to make the right approximations! 


Note:Approximate R and P: > RLS methods, as discussed last time. 


Note: Approximate the gradient! 


- 0 Elex”| 


Vy 
OW 


Note that €;, itself is a very noisy approximation to [eEx?| . We can get a 
noisy approximation to the gradient by finding the gradient of €,2! Widrow 
and Hoff first published the LMS algorithm, based on this clever idea, in 
1960. 


=e} (—x*) =— (2e,.X*) 


This yields the LMS adaptive filter algorithm 


Example: 
The LMS Adaptive Filter Algorithm 


i We X* — Se WEE K-i 

2-6 =p = Yk 

3. WE = Wh pk = We (—2e,X*) = W* + 2ye,X* ( 
wert = wk + 2uene ri) 


The LMS algorithm is often called a stochastic gradient algorithm, since 


Vi isa noisy gradient. This is by far the most commonly used adaptive 
filtering algorithm, because 


. it was the first 

. it is very simple 

. in practice it works well (except that sometimes it converges slowly) 

. it requires relatively litle computation 

. it updates the tap weights every sample, so it continually adapts the 
filter 

. it tracks slow changes in the signal statistics well 


um BWN Re 


o>) 


Computational Cost of LMS 


To Compute > Yk Ek wet = Total 
multiplies M 0 M+1 2M+1 
adds M-1 1 M 2M 


So the LMS algorithm is O(M) per sample. In fact, it is nicely balanced in 
that the filter computation and the adaptation require the same amount of 
computation. 


Note that the parameter ps plays a very important role in the LMS algorithm. 
It can also be varied with time, but usually a constant pz ("Convergence 
weight facor") is used, chosen after experimentation for a given application. 


Tradeoffs 
large ps: fast convergence, fast adaptivity 


small jw: accurate W — less misadjustment error, stability 


First Order Convergence Analysis of the LMS Algorithm 


Analysis of the LMS algorithm 


It is important to analyze the LMS algorithm to determine under what conditions it 
is stable, whether or not it converges to the Wiener solution, to determine how 
quickly it converges, how much degredation is suffered due to the noisy gradient, 
etc. In particular, we need to know how to choose the parameter pu. 


Mean of W 


does W *, k —> oo approach the Wiener solution? (since W* is always somewhat 
random in the approximate gradient-based LMS algorithm, we ask whether the 
expected value of the filter coefficients converge to the Wiener solution) 
Equation: 

E [wet] = Wl 
E|W* + 2ue,X*| 
WE + 2uE[deX*] + 2uB|—((W*X*) x*)] 


| 


| 


WF + 2uP+— (2uz|(W*"x*)x*|) 


Patently False Assumption 


X* and X*-*, X* and d*-*, and d; and d;_,; are statistically independent, 7 4 0. 
This assumption is obviously false, since X *~! is the same as X * except for 
shifting down the vector elements one place and adding one new sample. We 
make this assumption because otherwise it becomes extremely difficult to analyze 
the LMS algorithm. (First good analysis not making this assumption: Macchi and 
Eweda) Many simulations and much practical experience has shown that the 
results one obtains with analyses based on the patently false assumption above are 
quite accurate in most situations 


With the independence assumption, W * (which depends only on previous X*~*, 
d*~*) is statitically independent of X*, and we can simplify E (w* "xh ) xX | 


Now (wx 5) X* is a vector, and 


Equation: 
T ee 
E|(w* Xt) x = El) ye) when itey 
= | yey! Blwfay iti] 
= |) (wt) Flan ivi-j] 


ea: WET xx(4 = ij) 


— RW 


Tl . : 
where R = E [x kXk | is the data correlation matrix. 


Putting this back into our equation 
Equation: 


WHI = Wk42uP+— (2uzw*) 


= [W'+2uP 


Now if W*~> converges to a vector of finite magnitude ("Convergence in the 
mean"), what does it converge to? 


If W* converges, then as k + oo, W*+1 ~ W*, and 


Wee SIS PP 
2pRW S = QP 


RW = P 


or 


Woo = RP 


pt — 
the Wiener solution! 


So the LMS algorithm, if it converges, gives filter coefficients which on average 
are the Wiener coefficients! This is, of course, a desirable result. 


First-order stability 


But does W* converge, or under what conditions? 


Let's rewrite the analysis in term of V *, the "mean coefficient error vector' 


VE = W* — Wopt, where Wopt is the Wiener filter 


Wl — WE — 2uRWE + 2uP 


Wiel Wopt = Wwe = Wopt Deans (2uzW*) oe 2URWopt a 2uRW opt + 2uP 


VEH = VE — Qu RVE + — (2uRWopt) + 2uP 


Now Wot = R71, so 


Viel = pk 2uRVE 4+ — (2uRR'P) + 2uP = (I- 2R)V* 


We wish to know under what conditions V*~~ — 0? 


Linear Algebra Fact 


Since RF is positive definite, real, and symmetric, all the eigenvalues are real and 
positive. Also, we can write R as Q~ ‘AQ, where A is a diagonal matrix with 
diagonal entries A; equal to the eigenvalues of R, and Q is a unitary matrix with 
rows equal to the eigenvectors corresponding to the eigenvalues of R. 


Using this fact, 
yr = (I - Qu (Q-'AQ))V* 


multiplying both sides through on the left by @: we get 


QV = (Q — uAQ)V* = (1 — 22A)QV* 
LetV’ = QV: 
Vr Sr 2A © 


Note that V is simply V in a rotated coordinate set in R™, so convergence of V 
implies convergence of V. 


Since 1 — 2A is diagonal, all elements of V evolve independently of each other. 
Convergence (stability) bolis down to whether all M of these scalar, first-order 
difference equations are stable, and thus —> (0). 

Vi,i = [1,2,...,M]: (V,"*" = (1 — 2pa,)V, *) 
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These equations converge to zero if |1 — 2uA;| < 1, or Vi : (|wA;| < 1) wand A; 
are positive, So we require V2 : (u < 5 so for convergence in the mean of the 


LMS adaptive filter, we require 
Equation: 


This is an elegant theoretical result, but in practice, we may not know Amax, it may 
be time-varying, and we certainly won't want to compute it. However, another 
useful mathematical fact comes to the rescue... 


Since the eigenvalues are all positive and real. 


For a correlation matrix, Vi,z € {1, M}: (ri = r(0)). So 
tr (R) = Mr(0) = ME|xz2x;]. We can easily estimate r(0) with O(1) 
computations/sample, so in practice we might require 


as a conservative bound, and perhaps adapt yz accordingly with time. 


Rate of convergence 


Each of the modes decays as 


ee 


Note: The initial rate of convergence is dominated by the fastest mode 
1 — 2uAmax. This is not surprising, since a dradient descent method goes 
"downhill" in the steepest direction 


Note: The final rate of convergence is dominated by the slowest mode 
1 — 2uAmin. For small Amin, it can take a long time for LMS to converge. 


Note that the convergence behavior depends on the data (via R). LMS converges 
relatively quickly for roughly equal eigenvalues. Unequal eigenvalues slow LMS 
down a lot. 


Adaptive Equalization 


Note: Design an approximate inverse filter to cancel out as much distortion 
as possible. 


A 


In principle, WH ~ z~4, or W ~ on , so that the overall response of the 
top path is approximately 6(n — A). However, limitations on the form of 
W (FIR) and the presence of noise cause the equalization to be imperfect. 


Important Application 


Channel equalization in a digital communication system. 


Decision 


A 
Ss, (Chanse) Matched Filter S. 


If the channel distorts the pulse shape, the matched filter will no longer be 
matched, intersymbol interference may increase, and the system 


performance will degrade. 


An adaptive filter is often inserted in front of the matched filter to 
compensate for the channel. 


This is, of course, unrealizable, since we do not have access to the original 
transmitted signal, sx. 


There are two common solutions to this problem: 


1. Periodically broadcast a known training signal. The adaptation is 
switched on only when the training signal is being broadcast and thus 
S;, iS known. 

2. Decision-directed feedback: If the overall system is working well, then 
the output s,_~ should almost always equal s,_,~ . We can thus use 
our received digital communication signal as the desired signal, since 
it has been cleaned of noise (we hope) by the nonlinear threshold 
device! 

Decision-directed equalizer 


Delay of A 


As long as the error rate in §, is not too high (say 75%), this method 
works. Otherwise, d; is so inaccurate that the adaptive filter can never 
find the Wiener solution. This method is widely used in the telephone 
system and other digital communication networks. 


